D12746: KDevelop: alternative monolithic storage options for TopDUContexts (WIP/PoC)

Mon May 7 21:21:05 UTC 2018

rjvbb added a comment.

  >   so what's your plan with this now?

  I plan to continue to use this for topcontexts myself, maybe extend it to the other DUChain stores if/when I get around to understanding that code. The topcontexts storage was easy enough to tackle, and as mentioned, my main goal here was to get rid of the file-based implementation (and numerous mmaps) without taking a performance hit. I wasn't aiming to or expecting to improve topcontext performance, that's a misunderstanding I already tried to set right on the ML.

  One reason to publish the patch was clearly to make it available as a start for others to toy with (for lack of a better word).

  > The numbers aren't that useful on their own, as there the current implementation which is trivial and has no external dependencies is apparently still the best by a margin...

  For writing to and deleting from a local cache directory, indeed, but those are operations that occur only infrequently and rarely with the full range of indices. You've mentioned yourself that topcontext I/O is not what bounds performance, so the difference in read speed should be negligible when loading or parsing a project in KDevelop (and my experience confirms that; even the KyotoCabinet backend doesn't really feel slower).
  I'd expect that the better performance of LMDB on NFS shares might well be drowned out too if everything else is loaded over NFS too.

  That said, the benchmarks time serial access. I've deliberately run them on a system that has my usual other tasks running, but not inside a KDevelop session that's doing all kinds of other disk access. Depending on your disk and filesystem that could make a significant difference on performance of the current implementation, but I doubt it will be easy to quantify that.

  TBH, it took me a while before I got the database benchmarks to give consistent results esp. for writing, and not timings that fluctuated between much faster and much slower than the current values. See my remark above about others "playing" with this.

  >   hammering the big databases we have (i.e. AbstractItemRepository). *These* are the slow ones, since they are actually used a lot.

  Is that the database that's scattered in the other files, 1 directory above the topcontexts directory?

  >   Still, the current implementation was much faster than lmbd the last time I tried. The reason is that we can access and insert quickly, and
  >   lookup is O(1) since it's an on-disk hash map.

  The KyotoCabinet backend I tested also uses an on-disk hash map with O(1) lookup - according to the documents. I found its API the cleanest of the 3 I tried. Maybe it'll prove faster in other applications.

  >   But actually, I would very much appreciate getting *rid* of that complicated code and replacing it by something else.

  That certainly wouldn't hurt. I've managed to reduce the number of crashes and deadlocks with a patch that treats symptoms but a complete overhaul seems a good idea (as well as a lot of tedious and not very gratifying work). But would a simple key/value database be sufficient (rather than e.g. sqlite3)?

  > access cross-process safe, which I would really like to see in the long term.

  A central cache so different sessions that open the same file don't have to generate and maintain their own data? I think Xcode caches (as in: hides) something like that somewhere, which can become really huge (probably because it isn't well defined when you can/should remove cached data).

INLINE COMMENTS

> mwolff wrote in test_topcontextstore.cpp:76
> you are missing the 6k results here

I can add them of course, but from what I've seen they scale in a nicely linear fashion. My initial tests with the file-based backend showed non-linear scaling there which has now all but disappeared so I could just as well remove the 6k results, probably.

> mwolff wrote in test_topcontextstore.cpp:145
> that's pretty bad in comparison
> 
> note that we currently just rewrite the complete file, which you can't do when you use one big data base

That's removing all entries with a flush after each remove (which should be redundant).

There *is* original code that calls `QFile::remove()`, hopefully only when the index is not going to be reused immediately. Rewriting an entire file becomes overwriting an existing key/value pair, which all databases tested support in much the same way it is done with files.

> mwolff wrote in test_topcontextstore.cpp:355
> O_o make it a separate data file and read it from a resource instead of embedding it manually

I thought about using a qresource but the extra work seemed a bit pointless as it wouldn't reduce the size of the patch. I'd like to avoid binary files for now (until this actually begins to stand a chance to be committed).

REPOSITORY
  R32 KDevelop

REVISION DETAIL
  https://phabricator.kde.org/D12746

To: rjvbb, brauch
Cc: mwolff, kdevelop-devel, antismap, iodelay, vbspam, njensen, geetamc, Pilzschaf, akshaydeo, surgenight, arrowdodger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20180507/fcb63fa9/attachment-0001.html>