How and what does (plasma theme) caching operate? / KSharedDataCache issue?
mpyne at kde.org
Mon Sep 17 00:41:07 BST 2012
On Sunday, September 16, 2012 19:16:27 Thomas Lübking wrote:
> Because any access to the plasma theme cache takes considerable time here,
> i just took a short look.
> 1. all plasma theme caches are 82324 kB and mostly consist of paddings
> (actually the default one seems to consist of nothing else)
> 2. xz compressing them turns them to ~56 kB
> 3. uncompressing turns them to ~7MB
> 4. accessing them re-turns them to 82324 kB
> - why are there those giant files which mostly consist of paddings and
> take considerable time to load from disk? (the icon cache is "only" 11MB)
> -> Are the paddings mandatory or would it be possible to contract the
> content and omit the paddings?
The "padding" is probably related to the page size of the cache, which is
controlled by the combination of cache size and "expected item size" (smaller
items require more metadata overhead and can take longer to defragment)
> -> Should the cache be split up into "required by everybody" (frames
> etc) and "likely required by the desktop only" (clock, calendar, whatnot
Perhaps, you'd have to ask a Plasma dev.
> - (assuming they operate as SHM backends) why does every first access from
> within any process take about the very same time and (actually seems to)
> reload them into FS cache?
There's a few things going on.
1) KSharedDataCache is supposed to be a convenient way to put data into shared
memory. No disk access is actually ever made beyond using posix_fallocate (if
available) to pin down the required file backing the shared memory segment,
and mmap() usage.
2) One of the intended use cases are for caching values that take a
comparatively very long time to lookup. This includes generating SVG icons and
other SVG elements, but *also* includes simply looking up icons (which takes
quite a long time for a given icon). Because of this it was desired to have
the cache be saved to disk so it could be reloaded later when the user logs in
3) Another benefit to using disk is that it gives the OS options for relieving
memory pressure (i.e., it is auto-swapped). Maybe this isn't needed though?
I'm open to ideas for e.g. declaring memory-only caches if it improves
> - in case they are and if the files have a fixed layout, why are they
> (apparently) read completely instead of mmapping the relevant sections
> directly? (FS/tuning issue?)
Possibly it is being defragmented? I'm also open to good concurrent data
structures that can operate process-shared with finer-grained locking but
without needing dynamic memory allocation or keeping track of all attached
processes... most papers I've read (e.g. Maged Michael) look helpful but don't
apply directly to KSharedDataCache. I've also received a few speedup patches
(e.g. for 0-copy) that would have broken data integrity and/or caused crashes.
Er, even *more* crashes than the current code.
To go back to your question though, beyond defragmentation KSharedDataCache
itself doesn't read anything beyond the first KiB or so to map in the indexed
entry table and page status table.
However, simply "failing to find" an entry can conceivably take some time. The
entry table doesn't record a key, it records a fixed-size hash of the key.
Using quadratic chaining to verify that an entry is actually not present can
require up to 7 (IIRC) lookups into the index entry table, and collisions in
the key hash must be double-checked by looking at the actual data page itself
(where the full name is recorded).
> - is it "legally" possible to keep them compressed on disk and uncompress
> them into and use them from a (compressing) tempfs  whenever required
>  or should i just do such locally (w/o any profit for anyone else) -
> yes i'm aware that i should likely (maybe?) recompress them to HDD when
> exiting the session
KSharedDataCache uses the "cache" KStandardDir, so maybe setup an appropriate
FS, point the "cache" entry to that new FS, and see how it does? (I believe
you can set KDEVARTMP to override the "cache" entry from the environment). As
long as the OS supports mmap() it should work fine (but be careful about
posix_fallocate()... KSharedDataCache will call that to reserve enough space
as there is no other way to avoid SIGBUS signals. Even with a tmpfs a
compliant posix_fallocate would cause a ton of memory usage I think).
> - if not, should there be such behavior?
... maybe? I won't claim to have made tons of improvements to KSharedDataCache
since it was introduced (it's a way harder problem than I thought at first,
though someone may certainly have a much more reasonable way).
Thiago Macieira (in the context of the 0-copy patches) has recommended perhaps
using a data structure where entries can be added but not individually removed
(and therefore don't require defragmenting and can be packed as tightly as
feasible). Something like this may be better for this usecase, even if it's
not generally applicable.
> Sorry if i'm stupid and miss sth. or this is some local issue and the
> cache is mmapped and shared in RAM w/o any overhead for everybody else.
Well, that's certainly the goal (or rather, the necessary I/O should be
getting scheduled by the OS in a way that is satisfactory to the OS). In fact
the only two msync() calls are made when the KSharedDataCache is finally
destructed. Everything in between mmap and msync/munmap should be getting
handled by the OS.
Let me know how best I can support our desktop performance by improving
- Michael Pyne
>  what is much, MUCH, ***MUCH*** faster when done "by hand" and makes
> using plasma frames take like no time instead of stressing the HDD for
> seconds and quite reduces the startup time of the desktop shell - even in
> a running session.
Do you have more details on this (benchmark code and the file system used, how
it's attached, etc.)? I'm all for safe optimizations. :)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: This is a digitally signed message part.
More information about the kde-core-devel