<table><tr><td style="">rjvbb added a comment.
</td><a style="text-decoration: none; padding: 4px 8px; margin: 0 8px 8px; float: right; color: #464C5C; font-weight: bold; border-radius: 3px; background-color: #F7F7F9; background-image: linear-gradient(to bottom,#fff,#f1f0f1); display: inline-block; border: 1px solid rgba(71,87,120,.2);" href="https://phabricator.kde.org/D12746">View Revision</a></tr></table><br /><div><div><blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><div class="remarkup-code-block" style="margin: 12px 0;" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code" style="font: 11px/15px "Menlo", "Consolas", "Monaco", monospace; padding: 12px; margin: 0; background: rgba(71, 87, 120, 0.08);">so what's your plan with this now?</pre></div></blockquote>
<p>I plan to continue to use this for topcontexts myself, maybe extend it to the other DUChain stores if/when I get around to understanding that code. The topcontexts storage was easy enough to tackle, and as mentioned, my main goal here was to get rid of the file-based implementation (and numerous mmaps) without taking a performance hit. I wasn't aiming to or expecting to improve topcontext performance, that's a misunderstanding I already tried to set right on the ML.</p>
<p>One reason to publish the patch was clearly to make it available as a start for others to toy with (for lack of a better word).</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>The numbers aren't that useful on their own, as there the current implementation which is trivial and has no external dependencies is apparently still the best by a margin...</p></blockquote>
<p>For writing to and deleting from a local cache directory, indeed, but those are operations that occur only infrequently and rarely with the full range of indices. You've mentioned yourself that topcontext I/O is not what bounds performance, so the difference in read speed should be negligible when loading or parsing a project in KDevelop (and my experience confirms that; even the KyotoCabinet backend doesn't really feel slower).<br />
I'd expect that the better performance of LMDB on NFS shares might well be drowned out too if everything else is loaded over NFS too.</p>
<p>That said, the benchmarks time serial access. I've deliberately run them on a system that has my usual other tasks running, but not inside a KDevelop session that's doing all kinds of other disk access. Depending on your disk and filesystem that could make a significant difference on performance of the current implementation, but I doubt it will be easy to quantify that.</p>
<p>TBH, it took me a while before I got the database benchmarks to give consistent results esp. for writing, and not timings that fluctuated between much faster and much slower than the current values. See my remark above about others "playing" with this.</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><div class="remarkup-code-block" style="margin: 12px 0;" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code" style="font: 11px/15px "Menlo", "Consolas", "Monaco", monospace; padding: 12px; margin: 0; background: rgba(71, 87, 120, 0.08);">hammering the big databases we have (i.e. AbstractItemRepository). *These* are the slow ones, since they are actually used a lot.</pre></div></blockquote>
<p>Is that the database that's scattered in the other files, 1 directory above the topcontexts directory?</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><div class="remarkup-code-block" style="margin: 12px 0;" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code" style="font: 11px/15px "Menlo", "Consolas", "Monaco", monospace; padding: 12px; margin: 0; background: rgba(71, 87, 120, 0.08);">Still, the current implementation was much faster than lmbd the last time I tried. The reason is that we can access and insert quickly, and
lookup is O(1) since it's an on-disk hash map.</pre></div></blockquote>
<p>The KyotoCabinet backend I tested also uses an on-disk hash map with O(1) lookup - according to the documents. I found its API the cleanest of the 3 I tried. Maybe it'll prove faster in other applications.</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><div class="remarkup-code-block" style="margin: 12px 0;" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code" style="font: 11px/15px "Menlo", "Consolas", "Monaco", monospace; padding: 12px; margin: 0; background: rgba(71, 87, 120, 0.08);">But actually, I would very much appreciate getting *rid* of that complicated code and replacing it by something else.</pre></div></blockquote>
<p>That certainly wouldn't hurt. I've managed to reduce the number of crashes and deadlocks with a patch that treats symptoms but a complete overhaul seems a good idea (as well as a lot of tedious and not very gratifying work). But would a simple key/value database be sufficient (rather than e.g. sqlite3)?</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>access cross-process safe, which I would really like to see in the long term.</p></blockquote>
<p>A central cache so different sessions that open the same file don't have to generate and maintain their own data? I think Xcode caches (as in: hides) something like that somewhere, which can become really huge (probably because it isn't well defined when you can/should remove cached data).</p></div></div><br /><div><strong>INLINE COMMENTS</strong><div><div style="margin: 6px 0 12px 0;"><div style="border: 1px solid #C7CCD9; border-radius: 3px;"><div style="padding: 0; background: #F7F7F7; border-color: #e3e4e8; border-style: solid; border-width: 0 0 1px 0; margin: 0;"><div style="color: #74777d; background: #eff2f4; padding: 6px 8px; overflow: hidden;"><a style="float: right; text-decoration: none;" href="https://phabricator.kde.org/D12746#inline-65375">View Inline</a><span style="color: #4b4d51; font-weight: bold;">mwolff</span> wrote in <span style="color: #4b4d51; font-weight: bold;">test_topcontextstore.cpp:76</span></div>
<div style="margin: 8px 0; padding: 0 12px; color: #74777D;"><p style="padding: 0; margin: 8px;">you are missing the 6k results here</p></div></div>
<div style="margin: 8px 0; padding: 0 12px;"><p style="padding: 0; margin: 8px;">I can add them of course, but from what I've seen they scale in a nicely linear fashion. My initial tests with the file-based backend showed non-linear scaling there which has now all but disappeared so I could just as well remove the 6k results, probably.</p></div></div><br /><div style="border: 1px solid #C7CCD9; border-radius: 3px;"><div style="padding: 0; background: #F7F7F7; border-color: #e3e4e8; border-style: solid; border-width: 0 0 1px 0; margin: 0;"><div style="color: #74777d; background: #eff2f4; padding: 6px 8px; overflow: hidden;"><a style="float: right; text-decoration: none;" href="https://phabricator.kde.org/D12746#inline-65377">View Inline</a><span style="color: #4b4d51; font-weight: bold;">mwolff</span> wrote in <span style="color: #4b4d51; font-weight: bold;">test_topcontextstore.cpp:145</span></div>
<div style="margin: 8px 0; padding: 0 12px; color: #74777D;"><p style="padding: 0; margin: 8px;">that's pretty bad in comparison</p>
<p style="padding: 0; margin: 8px;">note that we currently just rewrite the complete file, which you can't do when you use one big data base</p></div></div>
<div style="margin: 8px 0; padding: 0 12px;"><p style="padding: 0; margin: 8px;">That's removing all entries with a flush after each remove (which should be redundant).</p>
<p style="padding: 0; margin: 8px;">There *is* original code that calls <tt style="background: #ebebeb; font-size: 13px;">QFile::remove()</tt>, hopefully only when the index is not going to be reused immediately. Rewriting an entire file becomes overwriting an existing key/value pair, which all databases tested support in much the same way it is done with files.</p></div></div><br /><div style="border: 1px solid #C7CCD9; border-radius: 3px;"><div style="padding: 0; background: #F7F7F7; border-color: #e3e4e8; border-style: solid; border-width: 0 0 1px 0; margin: 0;"><div style="color: #74777d; background: #eff2f4; padding: 6px 8px; overflow: hidden;"><a style="float: right; text-decoration: none;" href="https://phabricator.kde.org/D12746#inline-65376">View Inline</a><span style="color: #4b4d51; font-weight: bold;">mwolff</span> wrote in <span style="color: #4b4d51; font-weight: bold;">test_topcontextstore.cpp:355</span></div>
<div style="margin: 8px 0; padding: 0 12px; color: #74777D;"><p style="padding: 0; margin: 8px;">O_o make it a separate data file and read it from a resource instead of embedding it manually</p></div></div>
<div style="margin: 8px 0; padding: 0 12px;"><p style="padding: 0; margin: 8px;">I thought about using a qresource but the extra work seemed a bit pointless as it wouldn't reduce the size of the patch. I'd like to avoid binary files for now (until this actually begins to stand a chance to be committed).</p></div></div></div></div></div><br /><div><strong>REPOSITORY</strong><div><div>R32 KDevelop</div></div></div><br /><div><strong>REVISION DETAIL</strong><div><a href="https://phabricator.kde.org/D12746">https://phabricator.kde.org/D12746</a></div></div><br /><div><strong>To: </strong>rjvbb, brauch<br /><strong>Cc: </strong>mwolff, kdevelop-devel, antismap, iodelay, vbspam, njensen, geetamc, Pilzschaf, akshaydeo, surgenight, arrowdodger<br /></div>