UDSEntry compression ideas

Fri Sep 20 20:31:03 BST 2013

On Fri, Sep 20, 2013 at 10:29 AM, Frank Reininghaus <
frank78ac at googlemail.com> wrote:

> Hi Mark,
>
> 2013/9/20 Mark:
> [...]
> >> > The easiest "compression" that I could think of was to force the
> >> > UDSEntries to implicitly share all their QStrings for one field if
> >> > they are all equal:
> >> >
> >> > http://paste.kde.org/p52a24b49/
> >> >
> >> > Reduces the memory consumption in Details View for 100,000 items from
> >> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
> >> > might have a small performance penalty. Maybe it's worth investigating
> >> > that further though.
> >>
> >> OK, so I did some performance measurements. Short version: my earlier
> >> patch does make listing directories slightly slower, but I found
> >> another way to hack UDSEntry that does apparently not make things
> >> slower, and that reduces the memory usage by about 450 bytes per
> >> listed file, without any loss of functionality.
> >>
> >> Long version:
> >>
> >> (a) First of all, I took the current master branches of kdelibs and
> >> kde-baseapps (all built in release mode) and added some profiling
> >> output:
> >>
> >> http://paste.kde.org/p9b9b7b68/
> >> http://paste.kde.org/p992af44c/
> >>
> >> Then I opened a folder with 100,000 files in Dolphin in Icons View.
> >>
> >> According to KSysGuard, the memory usage of the dolphin process is 200.6
> >> MB.
> >>
> >> For the performance measurements, I discarded the first run and went
> >> out of the folder and back twice to get reproducible results (the time
> >> spent per item in the first run fluctuated strongly).
> >>
> >> The time that SlaveInterface::dispatch(int _cmd, const QByteArray
> >> &rawdata) needs per item was between 2.6 and 2.8 microseconds.
> >>
> >>
> >> (b) My first patch that only makes use of implicit sharing of the
> >> QStrings:
> >>
> >> http://paste.kde.org/p52a24b49/
> >>
> >> Memory usage drops to 187.9 MB, but the time spent per item increases
> >> to 2.9 to 3.1 microseconds (not surprising, I think).
> >>
> >>
> >> (c) My new patch, that makes more intrusive changes to the way the
> >> data is stored inside UDSEntry (explanations below):
> >>
> >> http://paste.kde.org/pc85e6b34/
> >>
> >> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9
> >> microseconds.
> >>
> >>
> >> This means that we can save more than 450 bytes for each file, which
> >> is IMHO a lot.
> >>
> >>
> >>
> >> How does it work?
> >>
> >> The basic observation is that UDSEntry currently stores all data in a
> >> QHash<uint, Field>, and that a QHash is not really space-efficient
> >> (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an
> >> idea how much memory is needed to store the data inside a QHash).
> >> However, it has the nice property that the "Field" that an "uint" is
> >> mapped to can be accessed in O(1) time.
> >>
> >> My idea to improve this and keep the O(1) access is the following: We
> >> store all fields in UDSEntryPrivate in a simple
> >>
> >> QVector<Field> fields
> >>
> >> in the order in which the fields are read from the stream. We remember
> >> which UDS field comes at which position and store this in a
> >>
> >> QHash<uint, int> udsIndexes
> >>
> >> (uint is the type of the "UDS field", and a QVector uses "int" for
> >> indexing, but there are still a few signed/unsigned warnings with my
> >> patch which need investigation).
> >>
> >> Then we can access a "Field" that corresponds to a uint "uds" with
> >>
> >> d->fields.at(d->udsIndexes.value(field)).
> >>
> >> Now the obvious question is: "How on earth can this be more efficient
> >> that the current solution? There is still a QHash in each
> >> UDSEntryPrivate after all, and the code gets more complicated!"
> >>
> >> The trick is that the mapping from the "uint uds" to the index of the
> >> corresponding "Field" is usually exactly the same for every UDSEntry
> >> if we read many of them in a row, and we can make all UDSEntryPrivates
> >> share this mapping (QHash is implicitly shared). This means that every
> >> UDSEntryPrivate only has a private QVector<Field>, which only needs a
> >> rather small overhead compared to the memory that the Fields
> >> themselves need.
> >>
> >> The patch definitely needs further testing and polishing. I probably
> >> won't get much work done until next week though.
> >>
> >> Best regards,
> >> Frank
> >
> >
> > Hi Frank,
> >
> > That is really stunning! I sadly don't have a proper testcase to test it
> on
> > frameworks.
> > So i made a very quick rough test app that simply stores all entries in
> > memory.
> >
> > Doing that i get these very impressive results:
> > PRE:
> > 500.000 files: 236,208MB
> >
> > POST (Frank):
> > 500.000 files: 118,148MB
> >
> > You can remove another ~14MB in both cases which the app just has with
> other
> > stuff going one.
> >
> > Hoewver, looking at your new patch i can't help but notice that the hash
> is
> > now only used as a lookup table. Perhaps you can just get rid of that now
> > and simply loop through the vector to find the right "Field" object? You
> > probably have to add a uint field to the Field struct in that case
> otherwise
> > you wouldn't know the field key. I guess it would take a bit more CPU
> power
> > since you would have to loop through all of QVector to find a falue. On
> the
> > other hand, you just have to loop over 8 items.. Perhaps that time is
> even
> > faster then a hash lookup even if that's O(1), it still has to do the
> hash
> > calculation some other stuff as opposed to just iterating over a vector
> and
> > comparing if the uint key (an int!) matches.
>
> Yes, the QHash is just a lookup table, but removing it won't help
> anything in terms of memory usage. Note that it's most likely
> implicitly shared with loads of other UDSEntryPrivate structs, so the
> amortized memory cost per UDSEntry is just a little more than the size
> of a pointer (8 bytes on a 64-bit system).
>
> OTOH, if you store a uint with each field, that requires
>
> "number of fields" * 4 bytes (size of unit) + overhead
>
> If you store the uints in a separate container, you have a certain
> constant overhead. If you store them inside the "Field" struct, the
> overhead will probably be 4 bytes per field on a 64-bit system because
> currently the "Field" contains a QString (effectively a pointer, i.e.,
> 8 bytes) and a long long (8 bytes). If you add a 4 byte uint to that
> struct, the compiler will add 4 extra bytes to the struct to preserve
> the 8 byte alignment of the pointers [1].
>
> Moreover, I'm not sure if it's safe to assume that each UDSEntry will
> only contain 8 items. There might be kioslaves which add a lot more
> now or in the future, so I'm not sure if sacrificing the O(1) access
> is a good idea.
>
> > Btw. This stuff just shows again how freaking smart you are
>
> Thanks :-) But there are also lots of programming-related areas where
> I'm a lot less smart than other people. And I should also say that
> this blog post by Milian made me aware of the implicit sharing idea, I
> don't think I would have had such ideas without reading it:
>
> http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1
>
> The only small flaw in the KMail use case is that the "implicit
> sharing pool" can in principle grow indefinitely if it's being fed
> with different strings all the time. But fortunately, this cannot
> happen with the implicit sharing implementation in my UDSEntry patch.I
> wouldn't be surprised if there were many more such opportunities in
> Qt-based applications and libraries.
>
> Cheers,
> Frank
>
> [1] By the way, a cool way to check such things quickly is
> http://gcc.godbolt.org/. Just paste the code below into the "Code
> editor", and you can immediately see the size of the different structs
> in the assembly output:
>
> struct test1
> {
>   void *v;
> };
>
> struct test2
> {
>   void *v;
>   int i;
> };
>
> struct test3
> {
>   void *v;
>   int i;
>   int j;  // Depending on the platform, 'j' will not need any
> additional space compared to 'test2'.
> };
>
> int size1()
> {
>   return sizeof(test1);
> }
>
> int size2()
> {
>   return sizeof(test2);
> }
>
> int size3()
> {
>   return sizeof(test3);
> }
>

That stuff is really cool!

Now i see a few more possible optimizations. I wonder what your opinion is
on those.

1. Instead of storing the string as QString, you can also store it as a
QByteArray and call squeeze after setting string. The places that returns a
QString simply constructs it QString(...)
2. Use the google "sparse_hash_map", It's extremely memory efficient!
https://code.google.com/p/sparsehash/ reading how it is implemented id also
very interesting:
https://sparsehash.googlecode.com/svn/trunk/doc/implementation.html that
was another "WOW!" moment for me when reading that. The downside, it's
slower.
3. Right now the Field struct stores a QString and a long long. Certainly
there must be a way to just use one type. I guess QVariant has too much
overhead to be used, but something like that.

Just my remaining ideas that i can't even implement :) (ok, 1 i can, 2 i
can likely, but 3.. nope. That requires templating knowledge that i simply
don't have if possible at all)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20130920/640c7af8/attachment.htm>