UDSEntry compression ideas

Frank Reininghaus frank78ac at googlemail.com
Fri Sep 20 09:29:49 BST 2013


Hi Mark,

2013/9/20 Mark:
[...]
>> > The easiest "compression" that I could think of was to force the
>> > UDSEntries to implicitly share all their QStrings for one field if
>> > they are all equal:
>> >
>> > http://paste.kde.org/p52a24b49/
>> >
>> > Reduces the memory consumption in Details View for 100,000 items from
>> > 160.5 MB to 147 MB here. The patch is a bit hackish though, and it
>> > might have a small performance penalty. Maybe it's worth investigating
>> > that further though.
>>
>> OK, so I did some performance measurements. Short version: my earlier
>> patch does make listing directories slightly slower, but I found
>> another way to hack UDSEntry that does apparently not make things
>> slower, and that reduces the memory usage by about 450 bytes per
>> listed file, without any loss of functionality.
>>
>> Long version:
>>
>> (a) First of all, I took the current master branches of kdelibs and
>> kde-baseapps (all built in release mode) and added some profiling
>> output:
>>
>> http://paste.kde.org/p9b9b7b68/
>> http://paste.kde.org/p992af44c/
>>
>> Then I opened a folder with 100,000 files in Dolphin in Icons View.
>>
>> According to KSysGuard, the memory usage of the dolphin process is 200.6
>> MB.
>>
>> For the performance measurements, I discarded the first run and went
>> out of the folder and back twice to get reproducible results (the time
>> spent per item in the first run fluctuated strongly).
>>
>> The time that SlaveInterface::dispatch(int _cmd, const QByteArray
>> &rawdata) needs per item was between 2.6 and 2.8 microseconds.
>>
>>
>> (b) My first patch that only makes use of implicit sharing of the
>> QStrings:
>>
>> http://paste.kde.org/p52a24b49/
>>
>> Memory usage drops to 187.9 MB, but the time spent per item increases
>> to 2.9 to 3.1 microseconds (not surprising, I think).
>>
>>
>> (c) My new patch, that makes more intrusive changes to the way the
>> data is stored inside UDSEntry (explanations below):
>>
>> http://paste.kde.org/pc85e6b34/
>>
>> Memory usage is now 156.8 MB, and the time per item 2.6 to 2.9
>> microseconds.
>>
>>
>> This means that we can save more than 450 bytes for each file, which
>> is IMHO a lot.
>>
>>
>>
>> How does it work?
>>
>> The basic observation is that UDSEntry currently stores all data in a
>> QHash<uint, Field>, and that a QHash is not really space-efficient
>> (see, e.g., http://doc.qt.digia.com/qq/qq19-containers.html to get an
>> idea how much memory is needed to store the data inside a QHash).
>> However, it has the nice property that the "Field" that an "uint" is
>> mapped to can be accessed in O(1) time.
>>
>> My idea to improve this and keep the O(1) access is the following: We
>> store all fields in UDSEntryPrivate in a simple
>>
>> QVector<Field> fields
>>
>> in the order in which the fields are read from the stream. We remember
>> which UDS field comes at which position and store this in a
>>
>> QHash<uint, int> udsIndexes
>>
>> (uint is the type of the "UDS field", and a QVector uses "int" for
>> indexing, but there are still a few signed/unsigned warnings with my
>> patch which need investigation).
>>
>> Then we can access a "Field" that corresponds to a uint "uds" with
>>
>> d->fields.at(d->udsIndexes.value(field)).
>>
>> Now the obvious question is: "How on earth can this be more efficient
>> that the current solution? There is still a QHash in each
>> UDSEntryPrivate after all, and the code gets more complicated!"
>>
>> The trick is that the mapping from the "uint uds" to the index of the
>> corresponding "Field" is usually exactly the same for every UDSEntry
>> if we read many of them in a row, and we can make all UDSEntryPrivates
>> share this mapping (QHash is implicitly shared). This means that every
>> UDSEntryPrivate only has a private QVector<Field>, which only needs a
>> rather small overhead compared to the memory that the Fields
>> themselves need.
>>
>> The patch definitely needs further testing and polishing. I probably
>> won't get much work done until next week though.
>>
>> Best regards,
>> Frank
>
>
> Hi Frank,
>
> That is really stunning! I sadly don't have a proper testcase to test it on
> frameworks.
> So i made a very quick rough test app that simply stores all entries in
> memory.
>
> Doing that i get these very impressive results:
> PRE:
> 500.000 files: 236,208MB
>
> POST (Frank):
> 500.000 files: 118,148MB
>
> You can remove another ~14MB in both cases which the app just has with other
> stuff going one.
>
> Hoewver, looking at your new patch i can't help but notice that the hash is
> now only used as a lookup table. Perhaps you can just get rid of that now
> and simply loop through the vector to find the right "Field" object? You
> probably have to add a uint field to the Field struct in that case otherwise
> you wouldn't know the field key. I guess it would take a bit more CPU power
> since you would have to loop through all of QVector to find a falue. On the
> other hand, you just have to loop over 8 items.. Perhaps that time is even
> faster then a hash lookup even if that's O(1), it still has to do the hash
> calculation some other stuff as opposed to just iterating over a vector and
> comparing if the uint key (an int!) matches.

Yes, the QHash is just a lookup table, but removing it won't help
anything in terms of memory usage. Note that it's most likely
implicitly shared with loads of other UDSEntryPrivate structs, so the
amortized memory cost per UDSEntry is just a little more than the size
of a pointer (8 bytes on a 64-bit system).

OTOH, if you store a uint with each field, that requires

"number of fields" * 4 bytes (size of unit) + overhead

If you store the uints in a separate container, you have a certain
constant overhead. If you store them inside the "Field" struct, the
overhead will probably be 4 bytes per field on a 64-bit system because
currently the "Field" contains a QString (effectively a pointer, i.e.,
8 bytes) and a long long (8 bytes). If you add a 4 byte uint to that
struct, the compiler will add 4 extra bytes to the struct to preserve
the 8 byte alignment of the pointers [1].

Moreover, I'm not sure if it's safe to assume that each UDSEntry will
only contain 8 items. There might be kioslaves which add a lot more
now or in the future, so I'm not sure if sacrificing the O(1) access
is a good idea.

> Btw. This stuff just shows again how freaking smart you are

Thanks :-) But there are also lots of programming-related areas where
I'm a lot less smart than other people. And I should also say that
this blog post by Milian made me aware of the implicit sharing idea, I
don't think I would have had such ideas without reading it:

http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1

The only small flaw in the KMail use case is that the "implicit
sharing pool" can in principle grow indefinitely if it's being fed
with different strings all the time. But fortunately, this cannot
happen with the implicit sharing implementation in my UDSEntry patch.I
wouldn't be surprised if there were many more such opportunities in
Qt-based applications and libraries.

Cheers,
Frank

[1] By the way, a cool way to check such things quickly is
http://gcc.godbolt.org/. Just paste the code below into the "Code
editor", and you can immediately see the size of the different structs
in the assembly output:

struct test1
{
  void *v;
};

struct test2
{
  void *v;
  int i;
};

struct test3
{
  void *v;
  int i;
  int j;  // Depending on the platform, 'j' will not need any
additional space compared to 'test2'.
};

int size1()
{
  return sizeof(test1);
}

int size2()
{
  return sizeof(test2);
}

int size3()
{
  return sizeof(test3);
}




More information about the kfm-devel mailing list