Review Request 123620: Fix sorting performance regression by avoiding unnecessary copying of the QCollator

Mon May 4 22:37:04 BST 2015

> On Mai 3, 2015, 10:51 nachm., Aleix Pol Gonzalez wrote:
> > Wow, this is a huge improvement. Do you know what actually triggers that improvement?

The reason why it's so much faster with this patch is that the repeated initialization of the QCollator is prevented. Initializing a single QCollator instance is very fast, but since the "lessThan" object, which contains the QCollator, is passed by value between the sorting functions, and it makes a deep copy of the QCollator (see the KFileItemModelLessThan copy constructor in https://git.reviewboard.kde.org/r/121817/diff/1/#index_header ), and recursive sorting algorithms make a lot of recursive calls for large amounts of data, the number of QCollator instances which have to be initialized is huge. With this patch, the "lessThan" object is passed by const reference, except when it's absolutely necessary to make a copy, i.e., when a new thread is created (see bug 342316).

The benchmark even models the best case that we can get - the 100,000 input files are already sorted because it was designed to test not the sorting, but other parts of the KFileItemModel class. I just tried what happens if I feed 100,000 files in random order into the model, such that the in-place merge sort (which is very similar to qStableSort) makes a lot more recursive calls. With the current master branch, it takes 12.58 seconds, but with this patch, it's only 122 ms :-)

BTW, I'm not sure if the claim in the QCollator docs that the class is reentrant is true. If it was, then Emmanuel's patch from https://bugs.kde.org/show_bug.cgi?id=342316#c7 should also have fixed the crash. In that patch, he created new QCollators for each thread by using the copy constructor, but it still crashed, even though it should be safe to use two instances of a reentrant class from different threads, no matter if one of them has been copy-constructed from the other one. After having a quick look at the QCollator source, I found that the copy constructor just increases the refcount of the QCollatorPrivate, but that private class does not use a mutex or any other obvious mechanism to protect access to its members. This is probably done for performance reasons, but if I haven't mis
 sed anything, then the sentence "Note: All functions in this class are reentrant." should be removed. I still have to look at this in more detail though, but it's too late already now ;-)

- Frank

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/123620/#review79820
-----------------------------------------------------------

On Mai 3, 2015, 9:14 nachm., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/123620/
> -----------------------------------------------------------
> 
> (Updated Mai 3, 2015, 9:14 nachm.)
> 
> 
> Review request for Dolphin.
> 
> 
> Repository: dolphin
> 
> 
> Description
> -------
> 
> I have found that the sorting performance when loading large folders is much worse than in the kdelibs 4.x-based Dolphin versions. The reason is the switch from KStringHandler::naturalCompare(...) to QCollator, which made creating copies of the QCollator object necessary (see https://git.reviewboard.kde.org/r/121817/) in order to fix crashes because QCollator's functions are, unlike KStringHandler::naturalCompare(...), not thread-safe.
> 
> This patch does the following:
> 
> 1. Simplify the benchmarks in kfileitemmodelbenchmark. Currently, the benchmarks are run for many different folder sizes, which was motivated by the O(N^2) complexity which earlier Dolphin versions had if many files with some specific patterns were added. Since these issues have been fixed a long time ago, I think that it makes sense to run the benchmarks for a single folder size only. This makes it much easier to understand the benchmark results at first sight and identify any performance regressions.
> 
> 2. Prevent that unnecessary expensive copies of the QCollator object are made. A copy is only needed if a new thread is forked. In all other cases, the "lessThan" object, which contains the QCollator, can be passed by const reference. I achieved that with rather small code changes, but they are a bit ugly - the only way to force the std::lower_bound and std::upper_bound functions to take a const reference was to specify the template arguments explicitly.
> 
> My next plans would be to add a benchmark which actually benchmarks natural sorting (the current benchmark disables it because its intention was to test the code that inserts items into the model). This could help to investigate if making use of QCollator::sortKey() might speed up sorting. Moreover, I think it might sense to move some or all of the lessThan-related code out of the KFileItemModel class to make it more maintainable.
> 
> 
> Diffs
> -----
> 
>   src/kitemviews/private/kfileitemmodelsortalgorithm.h 50db990 
>   src/tests/kfileitemmodelbenchmark.cpp 3ff0405 
> 
> Diff: https://git.reviewboard.kde.org/r/123620/diff/
> 
> 
> Testing
> -------
> 
> Unit tests still pass. I have verified (with some debug output in KFileItemModelLessThan::operator()) that different threads use different collator objects.
> 
> Benchmark results before/after this patch (with Qt, KF5 and Dolphin built in release mode) show that the performance is improved considerably:
> 
> Before:
> 
> ********* Start testing of KFileItemModelBenchmark *********
> Config: Using QtTest library 5.4.1, Qt 5.4.1 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 4.8.3 20140627 [gcc-4_8-branch revision 212064])
> PASS   : KFileItemModelBenchmark::initTestCase()
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all--n=100000":
>      2,575 msecs per iteration (total: 2,575, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(1st half + 2nd half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"1st half + 2nd half--n=100000":
>      2,596 msecs per iteration (total: 2,596, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(2nd half + 1st half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"2nd half + 1st half--n=100000":
>      2,594 msecs per iteration (total: 2,594, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(even + odd--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"even + odd--n=100000":
>      2,609 msecs per iteration (total: 2,609, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all - 2nd half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all - 2nd half--n=100000":
>      2,626 msecs per iteration (total: 2,626, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all - 1st half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all - 1st half--n=100000":
>      2,610 msecs per iteration (total: 2,610, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all - odd--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all - odd--n=100000":
>      2,636 msecs per iteration (total: 2,636, iterations: 1)
> PASS   : KFileItemModelBenchmark::cleanupTestCase()
> Totals: 9 passed, 0 failed, 0 skipped, 0 blacklisted
> ********* Finished testing of KFileItemModelBenchmark *********
> 
> 
> After:
> 
> ********* Start testing of KFileItemModelBenchmark *********
> Config: Using QtTest library 5.4.1, Qt 5.4.1 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 4.8.3 20140627 [gcc-4_8-branch revision 212064])
> PASS   : KFileItemModelBenchmark::initTestCase()
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all--n=100000":
>      21 msecs per iteration (total: 87, iterations: 4)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(1st half + 2nd half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"1st half + 2nd half--n=100000":
>      46 msecs per iteration (total: 92, iterations: 2)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(2nd half + 1st half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"2nd half + 1st half--n=100000":
>      41 msecs per iteration (total: 83, iterations: 2)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(even + odd--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"even + odd--n=100000":
>      56.5 msecs per iteration (total: 113, iterations: 2)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all - 2nd half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all - 2nd half--n=100000":
>      64 msecs per iteration (total: 64, iterations: 1)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all - 1st half--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all - 1st half--n=100000":
>      50.0 msecs per iteration (total: 100, iterations: 2)
> PASS   : KFileItemModelBenchmark::insertAndRemoveManyItems(all - odd--n=100000)
> RESULT : KFileItemModelBenchmark::insertAndRemoveManyItems():"all - odd--n=100000":
>      68 msecs per iteration (total: 68, iterations: 1)
> PASS   : KFileItemModelBenchmark::cleanupTestCase()
> Totals: 9 passed, 0 failed, 0 skipped, 0 blacklisted
> ********* Finished testing of KFileItemModelBenchmark *********
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20150504/e5454084/attachment.htm>