UDSEntry compression ideas

Thu Sep 26 21:48:28 BST 2013

Hi,

2013/9/21 David Faure:
> On Friday 20 September 2013 00:32:08 Frank Reininghaus wrote:
>> Now the obvious question is: "How on earth can this be more efficient
>> that the current solution? There is still a QHash in each
>> UDSEntryPrivate after all, and the code gets more complicated!"
>>
>> The trick is that the mapping from the "uint uds" to the index of the
>> corresponding "Field" is usually exactly the same for every UDSEntry
>> if we read many of them in a row, and we can make all UDSEntryPrivates
>> share this mapping (QHash is implicitly shared). This means that every
>> UDSEntryPrivate only has a private QVector<Field>, which only needs a
>> rather small overhead compared to the memory that the Fields
>> themselves need.
>>
>> The patch definitely needs further testing and polishing. I probably
>> won't get much work done until next week though.
>
> Nice tricks, very clever.

Thanks :-)

> The patch looks good to me.
>
> Just some minor issues with the patch:
> *  contains + value means double lookup, use iterators instead
>  (like the code was doing before - with a typedef for the QHash, for
> readability)
> * the initialisation of cachedStrings can be simplified with QVector::resize
> which does call the default constructor for each item.

Agreed, I'll change that.

Two more things that might be worth improving:

(a) With my patch, UDSEntry::remove(uint field) removes the "field"
from the QHash, but it neither frees the memory held by the
corresponding QString, nor does it remove the entry from the
QVector<Field>. I'm not sure how often it happens that fields are
removed and re-added to a UDSEntry, but if that is done repeatedly,
then the memory usage will grow indefinitely, which is not good.
Erasing the "Field" from the QVector every time might not be optimal
either because it makes "remove" a O(N) operation. Maybe one could
check if at least half the entries have been deleted and reorganize
the internal structure if that is the case. I'll look into that.

(b) Maybe I should add a unit test that stores multiple UDSEntries to
a stream, changes the order of the fields every time, re-loads them,
and verifies that everything is OK. Just to make sure that the code
really works even if not all UDSEntries have the same format inside
the stream.

Cheers,
Frank