Review Request 118452: Reduce the memory usage of UDSEntry by using QVector, rather than QHash, for the internal data storage

Frank Reininghaus frank78ac at googlemail.com
Thu Jun 12 07:27:56 UTC 2014



> On June 6, 2014, 10:34 p.m., David Faure wrote:
> > Interestingly, I had a benchmark to compare a number of data structures for UDSEntry (which made me turn the Qt3 QList into the Qt4 QHash). That benchmark is the commented-out "newApiPerformance()" method in jobtest.cpp.
> > 
> > I just extended it to include your suggestion: http://www.davidfaure.fr/2014/udsentry_benchmark.diff
> > 
> > Here are the results I get (in debug mode, because otherwise I'd have to make sure the compiler doesn't optimize away the useless iterations loops :))
> > 
> > Old API: slave code: 5
> > Old API: app code: 5
> > QHash+QVariant API: slave code: 9
> > QHash+QVariant API: app code: 4
> > QHash+struct API: slave code: 7
> > QHash+struct API: app code: 3
> > QMap+struct API: slave code: 8
> > QMap+struct API: app code: 4
> > Frank's API: slave code: 13
> > Frank API: app code: 6
> > 
> > So, unless I'm missing something, this is much slower for the slave (but we could decide to just not use UDSEntry in the slave, as discussed on k-f-d), but it's also slower on the app side....
> > 
> > Can you check if I did something stupid when grabbing your code or writing the testcode for it?
> > Otherwise the next step is to actually run this with -O2, ensuring the loops still loop.
> 
> Milian Wolff wrote:
>     I'd say a benchmark should use QBENCHMARK. Also, I'm afraid the results you posted are meaningless when they come from a non-optimized, esp. considering that the differences are so small. I suggest someone picks this test up and makes it a proper benchmark. To prevent the optimizer from kicking in, actually use the results in one way or another. E.g. calculate the sum of string lengths returned and number values for the read operation. And of course also compare the sum to some expected value.
> 
> David Faure wrote:
>     Yeah - this code is very old, it predates Q_BENCHMARK :-)
>     
>     "the differences are small" - not really, these are entire seconds, due to large amount of iterations.
>     
>     But yep, let me convert all this to a proper benchmark. Coming up.
> 
> David Faure wrote:
>     Done, see kio/autotests/udsentry_benchmark.
>     
>     Results (in release mode, with ./kiocore-udsentry_benchmark -minimumvalue 1000)
>     
>     KDE3:          slave=1123 app=1411
>     Hash+Variant:  slave=1856 app=1605
>     Hash+Struct:   slave=1051 app=1418  // the kde4 solution
>     Map+Struct:    slave=1418 app=1427
>     TwoVectors:    slave=1204 app=1390  // Frank's suggestion in this RR.
>     
>     Not bad :-)
>     
>     Again, please double-check the test and the results....
> 
> David Faure wrote:
>     Pure CPU results with -callgrind :
>     
>     KDE3:         slave=10284 app=14032
>     Hash+Variant: slave=14446 app=14456
>     Hash+Struct:  slave=9363  app=13973 // kde4
>     Map+Struct:   slave=11262 app=13857
>     TwoVectors:   slave=11199 app=13854 // this RR
>     
>     Looks somewhat similar: this solution is good for the app side, but on the slave side we can do better (especially if we use a different data structure there, with no lookup capabilities).
> 
> Mark Gaiser wrote:
>     So.. This sucks. I was expecting to test over and over again with "RelWithDebInfo" but apparently i uncovered a bug in kdesrc-build that doesn't honor that anymore and just compiles everything in debug. Oh well, reported on the mailinglist and re-compiled with RelWithDebInfo :)
>     
>     The numbers below are from the test results from the part: "0.002412 msecs per iteration (total: 1,265, iterations: 524288)" where i took the last 4 numbers of the "msecs per iteration". I'm a bit surprised that i'm not beating David's numbers :)
>     
>     Here are my results (with RelWithDebInfo obviously):
>     KDE3:             slave=1746 app=2264
>     Hash+Variant:     slave=2794 app=2725
>     Hash+Struct:      slave=1738 app=2470 // kde4
>     Map+Struct:       slave=1843 app=2260
>     TwoVectors:       slave=2023 app=2323 // this RR
>     TwoVectorsMark:   slave=1602 app=2412 // see below. Raw numbers: http://p.sc2.nl/pkbuewd2b Code: http://p.sc2.nl/pqlnisylt (1 month lifetime for both)
>     
>     So what did i do in "TwoVectorsMark" (internally called FrankUDSEntryFind because i was first only adding std::find). Well, i was guessing the indexAt to be the slowest in the current review request. And from tons of other benchmark experience i know that raw C++ containers almost always beat Qt containers in terms of speed (Qt usually is a bit more memory friendly when it comes to containers). So i simply changed QVector to std::vector along with std::find for finding the index. It seems to be doing rather well!
>     
>     Regarding the timing. I'm not going to claim that it's slower or faster then the current RR. The timings are different for me in every run (yes, even with -minimumvalue 1000). Sometimes it's the fastest of the tested approaches, sometimes it's slightly slower then this RR. It's just different. And i have no clue if the memory consumption is more or less with this approach (i expect slightly more). I haven't tested for that.
>

Thanks everyone for your efforts! Sorry for not replying earlier - I was away for a couple of days and still haven't quite caught up with all incoming KDE mails yet.

Proper benchmarking in release mode is very important, of course! However, note that I already did add a UDSEntry benchmark, which I hoped would model some app and slave workloads in the kio_file and the "worst case with maximum number of fields" somewhat realistically, in https://git.reviewboard.kde.org/r/115739/ - I guess nobody noticed because it is in tests/, rather than autotests/.

My reasoning for that was that benchmarks use a considerable amount of time by design, and 99.8% of the time, a developer or a CI system running the autotests will only be interested in test failures, and not in benchmark results. Right now, the new test takes 2.57 seconds with my un-optimized build, which is IMHO far too much for a test that is always run by everyone, but yields results that the vast majority will not look at. Adding autotests that take a noticeable amount of time might annoy people, waste energy and, perhaps most importantly, give developers an excellent excuse for not running them ;-) Therefore, I would suggest to move the new test to tests/. I haven't quite looked at it in detail, but my "old" autotest might be obsolete then.

In any case, I do know that this RR is currently sub-optimal on the slave side (see remark (b) in my RR description). I will try really hard to setup a realease build in the next few days (no time right now, have to go to work) and then post some benchmark results (including an improved version for the slave side).


- Frank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/118452/#review59479
-----------------------------------------------------------


On June 1, 2014, 1:50 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/118452/
> -----------------------------------------------------------
> 
> (Updated June 1, 2014, 1:50 p.m.)
> 
> 
> Review request for KDE Frameworks and David Faure.
> 
> 
> Repository: kio
> 
> 
> Description
> -------
> 
> I am continuing to split up https://git.reviewboard.kde.org/r/113355/ , which attempts to make UDSEntry more efficient memory and CPU-wise, into independent parts. This is the third step after 
> https://git.reviewboard.kde.org/r/113591/ and https://git.reviewboard.kde.org/r/115739/ .
> 
> The present patch modifies the internal data storage of UDSEntry. UDSEntry contains a mapping from unsigned int keys to "Field" values, where Field is a struct that contains a QString and a long long (some keys correspond to numeric values, like size, date, etc, whereas others, like user and group, correspond to a QString).
> 
> Currently, UDSEntry stores all data in a QHash<uint, Field> internally. This ensures that everything can be accessed in O(1) time, but is not very efficient memory-wise because a separate memory allocation is done for each hash node.
> 
> I propose to change this and store both the uint keys and the Field values in a QVector each. This means that accessing a value becomes a O(n) operation, since the entire QVector of keys may need to be scanned in order to find a value, but because the number n of values in a UDSEntry is usually quite small and can currently not exceed a number ~100, this should not matter in practice.
> 
> Some parts of https://git.reviewboard.kde.org/r/113355/ are still missing:
> 
> (a) The QVectors which store the keys (which are usually the same for all items in a directory) are not shared yet. Changing this would reduce the memory usage further, but I decided to keep this change separate in order to keep the current patch small and easy to understand. Moreover, this makes it easier to benchmark other similar approaches (e.g., replace QVector by std::vector, or store keys and values together in a std::vector<std::pair<uint,Field>>).
> 
> (b) No space is reserved in the vectors when key/value pairs are inserted one by one. Implementing this would make UDSEntry faster on the slave side (since repeated re-allocations would not be necessary any more), but this can be done in a later patch. Moreover, it might not be needed any more if UDSEntry is not used directly any more on the slave side, as suggested on the frameworks mailing list by Aaron (good idea BTW!). 
> 
> 
> Diffs
> -----
> 
>   src/core/udsentry.cpp c6ac21a 
> 
> Diff: https://git.reviewboard.kde.org/r/118452/diff/
> 
> 
> Testing
> -------
> 
> Unit tests still pass.
> 
> The memory usage of listjobtest with a directory with 100,000 files is reduced from 71344 K to 35392 K according to KSysGuard. I see similar savings when opening the directory in Dolphin.
> 
> I still haven't set up a Qt5/KF5 build in release mode (shame on me!), so I cannot present any benchmark results.
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20140612/72e86331/attachment-0001.html>


More information about the Kde-frameworks-devel mailing list