topcontexts

René J.V. Bertin rjvbertin at gmail.com
Mon Apr 9 13:54:50 UTC 2018


On Monday April 09 2018 13:23:12 Milian Wolff wrote:

>Without a profile, this is very far fetched to me. mmapping a small file is 
>super fast,

I'd hope so, but it cannot be not faster than opening the file and reading its contents. That's my point: there are filesystems where file creation/opening performance drops drastically when the number of files in a directory grows. FAT (FAT16?) was an infamous example where you could easily wait 20-30s to open or create a file above a couple hundred file (don't ask me how I know this...).
Who cares about FAT you might say, but know that HFS is basically about just as old underneath all its modern features and facelifts - and it'll be unavoidable on Mac for at least a few more years (and will remain so on HDDs as long as the new APFS is SSD-only). 

>if you compress the dir and do other stuff to it, then you are on 
>your own and have to live with the caveats of these approaches.

I'm not doing anything to the topcontexts directories, other than throwing them away regularly. Before applying HFS compression to the rest of the cache because with those files present I get at most 20% CPU usage in my multi-threaded compressor utility.

>> the only way I can think of not to use those topcontexts files is to empty 
>and  write-protect the topcontexts directories.
>
>Wich would of course completely break KDevelop.

Actually, no. Not at all in practice (or else features I never use). The code seems designed to handle write failures with no more than a warning (1 per context index thingy), and from what I understand the writing to disk is only necessary to persist the information. I can't even say that a subsequent reparse is inacceptably slower because of information that has to be regenerated instead of simply read from disk.
(In fact, I've already considered implementing a "trash cache on exit" option which would probably be a good enough solution to all my gripes with the duchain cache.)

I searched the tests, didn't find one to check/benchmark the topcontexts feature, but did notice there's a global switch that turns off persisting the data to disk.

>support. No for performance related reasons (the current approach, while old, 
>is *exactly* what the DUChain code built around it needs - no wonder it beats 
>the other generic solutions).

I wouldn't argue that if we were talking about a handful of items/files. But we're dealing with at least "thousands" of files here (e.g. 17Mb of files that are a few to about 30Kb small) so while there's none of the overhead that comes with those "other generic solutions" there might be performance penalties that come from the filesystem or even the use of so many mmapped files.
I'd want to see a quantitative comparison, preferably from code I can run myself.

I'm making progress with the LMDB backend I mentioned earlier. I'll put up a WIP diff on phab when I think it works.

R.


More information about the KDevelop-devel mailing list