[PATCH] Request review for KSharedDataCache

Sat Apr 10 05:54:51 BST 2010

Hi all,

Most of you are probably familiar with crashes that have occurred due to 
KPixmapCache. I've tried to make it better, and although it's actually better 
designed than I first thought, some of the design choices and the fact that 
it's needlessly specific to pixmaps led me to come up with a different 
implementation of a shared-memory cache, that can support arbitrary data 
types, hopefully with more performance and much more stability. I've attached 
a compressed patch for your review.

I've worked on this off-and-on probably since about last fall. The basic 
overview is that KSharedDataCache creates a file-backed shared memory segment 
that can be used to quickly share data between multiple processes. To 
coordinate access to the shared data, a pthread_mutex_t is used (I used 
pthreads instead of Qt since pthreads natively supports inter-process mutexes 
instead of just inter-thread. It would be possible to do a Qt-native spinlock 
type of mutex without too much trouble I think).

The cache itself has to be split up somehow. KPixmapCache chose to use two 
separate files, one for the index and one for the data, with the index using a 
binary tree with offsets into the data file. This made removing entries kind 
of a chore, so I went with a hash table arrangement. In my case I also 
combined the index and data into a single file (the extra space used for the 
index is not counted toward the cache size).

The non-cache portion (the index) is split into the cache header, the index 
table (giving information on each cache entry such as size, hash code, pages 
used), and the page table (giving information on each page of the cache and 
whether it's allocated or not).

Because the memory is fixed-size (more-or-less), there is no usage of buckets 
with chaining. Instead a consecutive set of pages is used to store new 
entries, so it's still possible to get fragmentation, but it's much easier to 
recover from.

Some of the design principles:
* It should be possible to leave the cache in a sane state without having to 
do any special un-initialization. This means that I don't have anything like 
an "attached client count" or anything similar. To detach from the cache the 
process just unmaps the memory segment.
* Accessing memory should be done in as safe a manner as possible since errors 
like unaligned memory access can crash a process on lots of architectures.
* When in doubt, kill the cache. I try to simply erase the cache and start 
over if ever a problem is developed. This relies on the fact that it's 
possible to unlink a file even if it's already mapped into shared memory 
without causing a crash. It works for me, but if that's not true on some POSIX 
systems out there I'd appreciate a heads up. ;)
* Arbitrary data should be allowed. For this I just store QByteArrays. (I've 
included KImageCache, which is a simple wrapper that converts to/from QImage).
* It should be possible to remove cache entries without needing to generate a 
whole new cache file. It should be possible to discard the cache without any 
special magic. (Using hashing enables this).
* The cache may live a long time, so there should be a way of automatically 
cleaning out very old entries if necessary. The way I implement this is, if a 
hash collision occurs, the current load factor is examined. If the cache is 
more than 50% full, then the use count of the currently existing entry is 
divided by 2. The probability goes up as the load factor goes up. If the use 
count dropped to 0 then the existing entry is replaced by the new one, 
otherwise a quadratic probing procedure is continued. This way old entries get 
booted eventually if they are interfering with new entries in a full cache, 
but that doesn't happen if the cache is nearly empty anyways.
* There should be a fallback if shared memory doesn't work. (My only fallback 
is to still use mmap, but with an anonymous private mapping instead of shared 
memory).
* For troubleshooting the cache usage I've added an itemModel() method to the 
cache, but I don't think it's good to have for the public API so I'll probably 
remove it in the final version unless it's needed for a usecase someone can 
think of.

The other major part of this patch is that I've converted the kdelibs users of 
KPixmapCache to use KImageCache or KSharedDataCache instead. There's also 
users in kdebase/runtime that I have patches for. I haven't looked elsewhere 
yet.

Anyways, this is getting kind of long-winded. Although there's a couple of 
issues I can think of that I forgot (references to KSharedPixmapCache, stray 
references to icons, .h comments and API docs need improved, CMake checks for 
pthreads, probably could use msync() after defragmenting or adding items, 
etc.) I think it's at least in good enough shape to get some eyeballs on it 
other than mine.

Please let me know what you think. Assuming this is good for KDE SC 4.5 I 
intend to deprecate KPixmapCache (or at least strongly warn against it in its 
API docs).

If you do try it and it freezes your session somehow, you can manually remove 
the files by deleting /var/tmp/kdecache-$USER/*.kcache for most systems 
($(kde4-config --path cache)/*.kcache in general).

I like mailing lists better for whatever reason but if it's helpful I can put 
this on ReviewBoard instead.

Regards,
 - Michael Pyne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kdelibs-kshareddatacache.patch.bz2
Type: application/x-bzip
Size: 18217 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20100410/6cbccf76/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20100410/6cbccf76/attachment.sig>