[RFC] [kservice] KPluginMetadata indexing

Milian Wolff mail at milianw.de
Thu Nov 6 10:20:21 UTC 2014


On Thursday 06 November 2014 10:09:51 Mark Gaiser wrote:
> On Thu, Nov 6, 2014 at 3:44 AM, Sebastian Kügler <sebas at kde.org> wrote:
> > Hi all,  especially Alex and David,
> > 
> > tl;dr:
> > I've done a proof-of-concept implementation of a metadata index for
> > KPluginTrader::query(), the main entry point when it comes to finding
> > binary plugins. This index considerably speeds up all current use cases,
> > but comes at the cost of having to maintain the index. Code is in
> > kservice[sebas/kpluginindex], speeds up plugin quering a few times.
> > 
> > The Slightly Longer Story...
> > 
> > During Akademy's frameworks and plasma bofs, we talked about indexing
> > plugins for faster lookups. One of the things we wanted to try in Plasma
> > is to index packages, and thereby speeding up package metadata lookups
> > and plugin queries.
> > 
> > I have done a naive implementation of such an indexing mechanism, and have
> > implemented this as a proof of concept in KService, specifically in
> > KPluginTrader::query(). This is using Alex Richardson's recent work on
> > KPluginMetadata, which I found very useful (
> > https://git.reviewboard.kde.org/r/120198/ and
> > https://git.reviewboard.kde.org/r/120199/ ). I've put these patches in my
> > branch kservice[sebas/kpluginindex].
> > 
> > Basic Mechanism
> > 
> > - a small tool called kplugin-update-index collects the json metadata from
> > the plugins, and puts the list of plugins in a given plugin directory
> > into a QJsonArray, and dumps that in Qt's json binary format to disk
> > - KPluginTrader::query checks if an index file exists in a given plugin
> > directory
> > -- if the index file exists, it reads it and creates a list of
> > KPluginMetaData objects from it
> > -- if the index file doesn't exist, it walks over each plugin to read its
> > metadata, it basically falls back to the old code path
> > 
> > Performance Measurement Method
> > 
> > I've created a new autotest, kpluginmetadatatest, which runs two
> > subsequent
> > queries and measure the time it takes to return the results. I've
> > instrumented the code in kplugintrader.cpp with QElapsedTimers. The
> > autotest runs on an environment on rotation metal and ssd in separate
> > test runs. Before cold cache tests, I've dropped page cache, dentries and
> > inodes from memory using echo 3 > /proc/sys/vm/drop_caches
> > Tests are running on Qt's 5.4 branch, they're fairly consistent with what
> > I've seen on Qt 5.3.
> > 
> > Performance Improvements
> > 
> > Performance tests are promising:
> > http://vizzzion.org/blog/wp-content/uploads/2014/11/performance-comparison
> > -charts.png (note that the metal's left-most bar is truncated by /10 in
> > the picture).
> > 
> > In short, the indexed queries are roughly:
> > * 60 times faster on a rotational medium with cold caches
> > * 3 times faster on an SSD with cold caches
> > * 7 times faster on  a rotational disk with warm caches
> > * 5 times faster on a SSD with warm caches
> > 
> > More Observations
> > - on ssds, we save most of the time in directory traversal and
> > (de)serializing the json metadata
> > - the index lookups spends almost all of its time in disk reads,
> > deserializing the binary metadata is almost free (Qt's json binary
> > representation is really fast to read)
> > - I haven't seen any tests in which the indexed queries have been slower.
> > 
> > These results can be explained as follows:
> > - the bottleneck is reading the files from disk
> > - on rotational media, expectedly we get huge performance penalties for
> > every seek we cause, the more files we read, the more desastrous lookups
> > times get. - Expectedly, warm pagecaches help a lot in all cases
> > 
> > Cost: Maintaining the Cache
> > 
> > These speedups do come at a cost, of course, and that is the added
> > complexity of maintaining the caches. The idea from the bof sessions had
> > been to update the caches at install time, this is essentially what can
> > be done with kplugin- update-index (it needs some added logic to give the
> > index files sensible permissions when run as root). That means that
> > packagers will have to run the index updater in their postinstall
> > routine. Not doing this at all means slower queries (or rather, no
> > speedier queries), worse is if they forget to update once in a while, in
> > which case newly installed or removed plugins might be missing or
> > dangling in the index files. This will need at least some packaging
> > discipline.
> > 
> > Index File Location
> > 
> > The indexer creates the index files in the plugin directories itself, not
> > in $CACHE or $TMP. This seems the most straight-forward way to do it,
> > since if a plugin is installed into a specific directory, the "installer"
> > will have write permission there to update the index as well. One might
> > consider putting these index files in the cache directory, like ksycoca
> > does, but in that case, we need to be smarter to actually update the
> > index files correctly, since at that point, it depends on the environment
> > of the user and the plugin paths (which means, it can't sensibly be done
> > at install-time).
> > 
> > KServiceTyperTrader Comparison
> > 
> > First off, for the current situation, the comparison to KServiceTypeTrader
> > is not of much use, since it's orthogonal to KPluginTrader.
> > That aside, I've run the same queries through KServiceTypeTrader (with
> > different results, of course, and just on an ssd).
> > With cold caches KServiceTypeTrader is 40 times faster than unindexed
> > queries (current status quo), and still  times faster with indices.
> > Successive queries are about 100 times faster than indexed queries.
> > KServiceTypeTrader is still a lot faster, supposedly since we're reading
> > one larger file, instead of multiple ones. It may make sense to cache the
> > index files read from disk, which should get us in the ballpark of
> > KServiceTypeTrader again.
> > 
> > Feedback, please!
> > 
> > So, this code is in a bit of a draft stage, I'd very much welcome feedback
> > about the approach, and of course the code itself. It can be found in
> > kservice[sebas/kpluginindex]. the kpluginmetadata autotest gives a useful
> > testing target. I didn't submit it to reviewboard yet, because I want to
> > nail down the further direction, and provide a base to discuss on.
> > 
> > Cheers,
> > --
> > sebas
> 
> Hi Sebas,
> 
> I'm curious about one thing. Have you done some profiling on the
> current KPluginMetaData to see where the actual hot spot is?

Or is the benchmark you ran available so we can run it as well?

> In case you don't know how to do that, here are some tips:
> 1. Recompile Qt with debug symbols (not debug mode, just with the debug
> symbols) 2. Run a benchmark application via valgrind like so: valgrind
> --tool=callgrind <your_benchmark_app>

Better use perf or VTune, esp. the latter with it's locks&waits will help to 
find disk IO, which callgrind won't ever see.

> 3. Open the output file of the line above in KCacheGrind and hunt for
> those pesky hot spots.
> 
> Perhaps there is nothing to optimize and then having an index (and the
> cost of maintaining it) is worth it, but it would be best to first
> determine if the current code path can be optimized.

True.

Bye
-- 
Milian Wolff
mail at milianw.de
http://milianw.de


More information about the Kde-frameworks-devel mailing list