[RFC] [kservice] KPluginMetadata indexing

Sebastian Kügler sebas at kde.org
Thu Nov 6 02:44:58 UTC 2014


Hi all,  especially Alex and David,

tl;dr:
I've done a proof-of-concept implementation of a metadata index for 
KPluginTrader::query(), the main entry point when it comes to finding binary 
plugins. This index considerably speeds up all current use cases, but comes at 
the cost of having to maintain the index. Code is in 
kservice[sebas/kpluginindex], speeds up plugin quering a few times.

The Slightly Longer Story...

During Akademy's frameworks and plasma bofs, we talked about indexing plugins 
for faster lookups. One of the things we wanted to try in Plasma is to index 
packages, and thereby speeding up package metadata lookups and plugin queries.

I have done a naive implementation of such an indexing mechanism, and have 
implemented this as a proof of concept in KService, specifically in 
KPluginTrader::query(). This is using Alex Richardson's recent work on 
KPluginMetadata, which I found very useful ( 
https://git.reviewboard.kde.org/r/120198/ and 
https://git.reviewboard.kde.org/r/120199/ ). I've put these patches in my 
branch kservice[sebas/kpluginindex].

Basic Mechanism

- a small tool called kplugin-update-index collects the json metadata from the 
plugins, and puts the list of plugins in a given plugin directory into a 
QJsonArray, and dumps that in Qt's json binary format to disk
- KPluginTrader::query checks if an index file exists in a given plugin 
directory
-- if the index file exists, it reads it and creates a list of KPluginMetaData 
objects from it
-- if the index file doesn't exist, it walks over each plugin to read its 
metadata, it basically falls back to the old code path

Performance Measurement Method

I've created a new autotest, kpluginmetadatatest, which runs two subsequent 
queries and measure the time it takes to return the results. I've instrumented 
the code in kplugintrader.cpp with QElapsedTimers. The autotest runs on an 
environment on rotation metal and ssd in separate test runs. Before cold cache 
tests, I've dropped page cache, dentries and inodes from memory using 
echo 3 > /proc/sys/vm/drop_caches
Tests are running on Qt's 5.4 branch, they're fairly consistent with what I've 
seen on Qt 5.3.

Performance Improvements

Performance tests are promising: 
http://vizzzion.org/blog/wp-content/uploads/2014/11/performance-comparison-charts.png (note that the metal's left-most bar is truncated by /10 in the 
picture).

In short, the indexed queries are roughly:
* 60 times faster on a rotational medium with cold caches
* 3 times faster on an SSD with cold caches
* 7 times faster on  a rotational disk with warm caches
* 5 times faster on a SSD with warm caches

More Observations
- on ssds, we save most of the time in directory traversal and (de)serializing 
the json metadata
- the index lookups spends almost all of its time in disk reads, deserializing 
the binary metadata is almost free (Qt's json binary representation is really 
fast to read)
- I haven't seen any tests in which the indexed queries have been slower.

These results can be explained as follows:
- the bottleneck is reading the files from disk
- on rotational media, expectedly we get huge performance penalties for every 
seek we cause, the more files we read, the more desastrous lookups times get. 
- Expectedly, warm pagecaches help a lot in all cases

Cost: Maintaining the Cache

These speedups do come at a cost, of course, and that is the added complexity 
of maintaining the caches. The idea from the bof sessions had been to update 
the caches at install time, this is essentially what can be done with kplugin-
update-index (it needs some added logic to give the index files sensible 
permissions when run as root). That means that packagers will have to run the 
index updater in their postinstall routine. Not doing this at all means slower 
queries (or rather, no speedier queries), worse is if they forget to update 
once in a while, in which case newly installed or removed plugins might be 
missing or dangling in the index files. This will need at least some packaging 
discipline.

Index File Location

The indexer creates the index files in the plugin directories itself, not in 
$CACHE or $TMP. This seems the most straight-forward way to do it, since if a 
plugin is installed into a specific directory, the "installer" will have write 
permission there to update the index as well. One might consider putting these 
index files in the cache directory, like ksycoca does, but in that case, we 
need to be smarter to actually update the index files correctly, since at that 
point, it depends on the environment of the user and the plugin paths (which 
means, it can't sensibly be done at install-time).

KServiceTyperTrader Comparison

First off, for the current situation, the comparison to KServiceTypeTrader is 
not of much use, since it's orthogonal to KPluginTrader. 
That aside, I've run the same queries through KServiceTypeTrader (with 
different results, of course, and just on an ssd). 
With cold caches KServiceTypeTrader is 40 times faster than unindexed queries 
(current status quo), and still  times faster with indices.
Successive queries are about 100 times faster than indexed queries. 
KServiceTypeTrader is still a lot faster, supposedly since we're reading one 
larger file, instead of multiple ones. It may make sense to cache the index 
files read from disk, which should get us in the ballpark of 
KServiceTypeTrader again.

Feedback, please!

So, this code is in a bit of a draft stage, I'd very much welcome feedback 
about the approach, and of course the code itself. It can be found in 
kservice[sebas/kpluginindex]. the kpluginmetadata autotest gives a useful 
testing target. I didn't submit it to reviewboard yet, because I want to nail 
down the further direction, and provide a base to discuss on.

Cheers,
-- 
sebas

http://www.kde.org | http://vizZzion.org | GPG Key ID: 9119 0EF9


More information about the Kde-frameworks-devel mailing list