Review Request 118452: Reduce the memory usage of UDSEntry by using QVector, rather than QHash, for the internal data storage

Frank Reininghaus frank78ac at googlemail.com
Thu Dec 11 22:08:48 UTC 2014


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/118452/
-----------------------------------------------------------

(Updated Dez. 11, 2014, 10:08 nachm.)


Review request for KDE Frameworks and David Faure.


Changes
-------

Thanks for your comments, Milian and Mark!

Thanks also for fixing the benchmark, Milian. The point of the QCOMPARE calls was, as you have probably guessed, to prevent that the compiler optimizes some or all of the code in QBENCHMARK {...} away. But I admit that this approach was a bit, erm, suboptimal. Your solution is much better indeed :-)

The idea to factor out the common code from the insert(...) functions was quite good, thanks! It even made the code faster, maybe because this is more cache-friendly and makes branch prediction easier. 

For the createSmallEntries() benchmark, I get, with "tests/udsentrybenchmark -perf -perfcounter instr:k", the following number of instructions:

master:            322801129
rev. 2 of this RR: 331020709 (+3%)
rev. 3 of this RR: 198900694 (-38% compared to master)

The QVarLengthArray idea is also quite good! I think this should be done in a separate patch though. I was going to do some further experiments anyway (like checking how much we can gain if we let the UDSEntries implicitly share the 'udsIndexes' QVectors, or if using std::vector instead of QVector makes any difference - comparing that to QVarLengthArray would be interesting indeed).


Repository: kio


Description
-------

I am continuing to split up https://git.reviewboard.kde.org/r/113355/ , which attempts to make UDSEntry more efficient memory and CPU-wise, into independent parts. This is the third step after 
https://git.reviewboard.kde.org/r/113591/ and https://git.reviewboard.kde.org/r/115739/ .

The present patch modifies the internal data storage of UDSEntry. UDSEntry contains a mapping from unsigned int keys to "Field" values, where Field is a struct that contains a QString and a long long (some keys correspond to numeric values, like size, date, etc, whereas others, like user and group, correspond to a QString).

Currently, UDSEntry stores all data in a QHash<uint, Field> internally. This ensures that everything can be accessed in O(1) time, but is not very efficient memory-wise because a separate memory allocation is done for each hash node.

I propose to change this and store both the uint keys and the Field values in a QVector each. This means that accessing a value becomes a O(n) operation, since the entire QVector of keys may need to be scanned in order to find a value, but because the number n of values in a UDSEntry is usually quite small and can currently not exceed a number ~100, this should not matter in practice.

Some parts of https://git.reviewboard.kde.org/r/113355/ are still missing:

(a) The QVectors which store the keys (which are usually the same for all items in a directory) are not shared yet. Changing this would reduce the memory usage further, but I decided to keep this change separate in order to keep the current patch small and easy to understand. Moreover, this makes it easier to benchmark other similar approaches (e.g., replace QVector by std::vector, or store keys and values together in a std::vector<std::pair<uint,Field>>).

(b) No space is reserved in the vectors when key/value pairs are inserted one by one. Implementing this would make UDSEntry faster on the slave side (since repeated re-allocations would not be necessary any more), but this can be done in a later patch. Moreover, it might not be needed any more if UDSEntry is not used directly any more on the slave side, as suggested on the frameworks mailing list by Aaron (good idea BTW!). 


Diffs (updated)
-----

  autotests/udsentry_benchmark.cpp b5fa8d6 
  src/core/udsentry.h 98a7035 
  src/core/udsentry.cpp b806e0e 
  src/ioslaves/file/file.cpp 1a2a767 
  tests/udsentrybenchmark.cpp 9bedb7b 

Diff: https://git.reviewboard.kde.org/r/118452/diff/


Testing
-------

Unit tests still pass.

The memory usage of listjobtest with a directory with 100,000 files is reduced from 71344 K to 35392 K according to KSysGuard. I see similar savings when opening the directory in Dolphin.

I still haven't set up a Qt5/KF5 build in release mode (shame on me!), so I cannot present any benchmark results.


File Attachments
----------------

Benchmark results
  https://git.reviewboard.kde.org/media/uploaded/files/2014/12/09/038e443c-78eb-443b-b33a-b451b28d10ea__UDSEntry-benchmarks.png


Thanks,

Frank Reininghaus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20141211/6417010d/attachment.html>


More information about the Kde-frameworks-devel mailing list