Review Request 118452: Reduce the memory usage of UDSEntry by using QVector, rather than QHash, for the internal data storage

Wed Dec 10 23:00:46 UTC 2014

On Dec. 10, 2014, 7:40 p.m., Frank Reininghaus wrote:
> > As for all the appends that you do on a QVector, you might be able to speed them up significantly if you know the size (which you do).
> > The reason for that potential speedup is append doing some stuff to determine if it can append or if it needs to grow the vector first (amongst things). That works very well when you don't know the size, but you don't need that "overhead" if you do know the size in advance.
> > 
> > My benchmark code is here: https://paste.kde.org/pxly3rryg (1 year lifetime)
> > 
> > The results i get (on - cough - windows, but with mingw. My qt was broken under linux):
> > "Append took: 278 ms"
> > "More complicated append took: 92 ms"
> > 
> > The above timing are just the insertion. The reserve() and resize() call are not in them. However, if i run the same benchmark with those calls included then the timings are as follows:
> > "Append took: 278 ms"
> > "More complicated append took: 101 ms"
> > 
> > The regular append(...) call took 278 ms for 10 million items.
> > The array assignment operator took merely 92 till 101 ms (~3x faster then append!). That is including a bounds check and using append if you go over your bounds.
> > I think you can only win speed with this in the avarage usecase.
> > 
> > Note: for this optimization you have to use "resize" on the QVector, not "reserve".
> > 
> > Disclaimer: i know this is most deffinately a micro optimization, but it's fun to play with options and see if you can pull out more.
> 
> Milian Wolff wrote:
>     your benchmark is flawed: your second loop also contains branches, which the former does not. Both iterate over the same size, no? So you just want to compare resize + operator[] vs. reserve + append? Yes, the former will be faster, but afaik only slightly. Please write a proper benchmark with QBENCHMARK and use `-perf -perfcounter instr:k`, instead of a bogus QTimer. Also, you sure this was compiled with optimizations enabled?
>     
>     Furthermore, this is also a micro-optimization as the cost of append vs. operator[] is negleglible compared to the cost of the allocations and the rest of whats going on here. I really don't think this is worth it.

My second loop intentionally contains branches since that's how it will be used in reality. The reason for that is the udsentry being different in size depending on the settings you pass to it. 8 fields is the common usecase. The first loop doesn't contain branches - directly - but does do so indirectly within QVector.

"So you just want to compare resize + operator[] vs. reserve + append?"
Yes.

Perf... :) I know. I disagree that this timer is bogus. QElapsedTimer is good enough to get a rough view if something is faster then something else. Perf helps when fine tuning it. Yes, this was compiled with optimizations in release mode. At least my project was. I don't know for the Qt version (5.4.0). That's just the package with the mingw installer from qt.io.

I'm actually not sure if it's worth it either. From a readability point of view it's still quite readable. But i don't know how much difference this makes in real world 100.000 items sized folder (the benchmark for this RR). Not much i think.

- Mark

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/118452/#review71736
-----------------------------------------------------------

On Dec. 9, 2014, 10:44 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/118452/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2014, 10:44 p.m.)
> 
> 
> Review request for KDE Frameworks and David Faure.
> 
> 
> Repository: kio
> 
> 
> Description
> -------
> 
> I am continuing to split up https://git.reviewboard.kde.org/r/113355/ , which attempts to make UDSEntry more efficient memory and CPU-wise, into independent parts. This is the third step after 
> https://git.reviewboard.kde.org/r/113591/ and https://git.reviewboard.kde.org/r/115739/ .
> 
> The present patch modifies the internal data storage of UDSEntry. UDSEntry contains a mapping from unsigned int keys to "Field" values, where Field is a struct that contains a QString and a long long (some keys correspond to numeric values, like size, date, etc, whereas others, like user and group, correspond to a QString).
> 
> Currently, UDSEntry stores all data in a QHash<uint, Field> internally. This ensures that everything can be accessed in O(1) time, but is not very efficient memory-wise because a separate memory allocation is done for each hash node.
> 
> I propose to change this and store both the uint keys and the Field values in a QVector each. This means that accessing a value becomes a O(n) operation, since the entire QVector of keys may need to be scanned in order to find a value, but because the number n of values in a UDSEntry is usually quite small and can currently not exceed a number ~100, this should not matter in practice.
> 
> Some parts of https://git.reviewboard.kde.org/r/113355/ are still missing:
> 
> (a) The QVectors which store the keys (which are usually the same for all items in a directory) are not shared yet. Changing this would reduce the memory usage further, but I decided to keep this change separate in order to keep the current patch small and easy to understand. Moreover, this makes it easier to benchmark other similar approaches (e.g., replace QVector by std::vector, or store keys and values together in a std::vector<std::pair<uint,Field>>).
> 
> (b) No space is reserved in the vectors when key/value pairs are inserted one by one. Implementing this would make UDSEntry faster on the slave side (since repeated re-allocations would not be necessary any more), but this can be done in a later patch. Moreover, it might not be needed any more if UDSEntry is not used directly any more on the slave side, as suggested on the frameworks mailing list by Aaron (good idea BTW!). 
> 
> 
> Diffs
> -----
> 
>   autotests/udsentry_benchmark.cpp b5fa8d6 
>   src/core/udsentry.h 98a7035 
>   src/core/udsentry.cpp b806e0e 
>   src/ioslaves/file/file.cpp 1a2a767 
>   tests/udsentrybenchmark.cpp d9a118c 
> 
> Diff: https://git.reviewboard.kde.org/r/118452/diff/
> 
> 
> Testing
> -------
> 
> Unit tests still pass.
> 
> The memory usage of listjobtest with a directory with 100,000 files is reduced from 71344 K to 35392 K according to KSysGuard. I see similar savings when opening the directory in Dolphin.
> 
> I still haven't set up a Qt5/KF5 build in release mode (shame on me!), so I cannot present any benchmark results.
> 
> 
> File Attachments
> ----------------
> 
> Benchmark results
>   https://git.reviewboard.kde.org/media/uploaded/files/2014/12/09/038e443c-78eb-443b-b33a-b451b28d10ea__UDSEntry-benchmarks.png
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20141210/7b25717f/attachment-0001.html>