Review Request 113485: Speed up natural sorting by first sorting the items according to KFileItem::text() with QString::operator<()

Frank Reininghaus frank78ac at googlemail.com
Thu Oct 31 10:09:12 GMT 2013



> On Oct. 29, 2013, 2:19 p.m., Emmanuel Pescosta wrote:
> > A "Ship It" from my side.
> > 
> > Great idea btw.
> > 
> > Another great thing about this patch is, that we can make use of these sequences of already "sorted" items
> > to speed up the sorting, when we use Timsort in future (maybe?).
> 
> Frank Reininghaus wrote:
>     Yes, it would definitely be interesting to see how this approach works when combined with Timsort. I would expect that it will speed up the sorting even more. It would also be interesting to try if Timsort works well only for the "final sorting", or if it can even make the "first non-natural sorting" faster if it is modified such that it can make use of multiple CPU cores (when sorting items in random order, plain Timsort will most likely always be slower than multithreaded Merge Sort on a machine with many cores).
> 
> Emmanuel Pescosta wrote:
>     > can make use of multiple CPU cores
>     I'm already working on it ;)
>     
>     But my time is currently very very limited so I can't finish it for Dolphin 4.12.
> 
> Christoph Feck wrote:
>     Just one question, I did not look at the code deeply. If I do not have "sort by name", but a different sort order enabled, will it still first sort by name, then do a second sort?
>     
>     NB: If we are going to use a custom sorting algorithm (instead of just using std::), and we can rely on some QCollator giving us a simple QByteArray for memcmp, then we should have a look at some form of radix sort, instead of comparison sorts. The latter are limited to O(N log N), while the former can asymptotically reach O(N). Also, lookup and single item insert/remove can asymptotically reach O(1) (as in hashes, but sorted).

Thanks Christoph, good catch! We should of course only do this two-stage sorting if it is expected to be beneficial (i.e., if sorting "naturally" by name), but I see now that I implemented the corresponding check incompletely. The check

if (m_naturalSorting)

should be replaced by

if (m_sortRole == NameRole && m_naturalSorting).

I'll be travelling for a few days, so it might take a while until I can make that change. If anyone wants to fix this earlier, please go ahead.

About the 'NB': AFAIK, the version of QCollator that is public API in Qt 5.2 now does not provide a QByteArray "sort key" any more, but a QCollatorSortKey, which is an implicitly shared type that wraps a QByteArray if ICU is available, and some possible alternative on Windows/Mac if it isn't. I guess that this was necessary to make Qt versions which are built with/without ICU on these platforms binary compatible. The QByteArray inside this class cannot be accessed.

This means that we only have access to the "operator<" that compares two QCollatorSortKeys, and unfortunately, the double indirection through the QCollatorSortKey's and QByteArray's d pointers might also harm the main goal of the entire "sort key" idea, namely, to provide a way to do very fast comparisons. Finding out how much this affects the sorting performance in practice might be an interesting experiment.


- Frank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113485/#review42636
-----------------------------------------------------------


On Oct. 30, 2013, 4:53 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/113485/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2013, 4:53 p.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Repository: kde-baseapps
> 
> 
> Description
> -------
> 
> We all know now that natural sorting can be very slow. There are two ways to improve this:
> 
> 1. Make the "natural comparison" of two items faster. We might be able to do this when we can use QCollator from Qt5.
> 
> 2. Reduce the number of (expensive) comparisons.
> 
> This patch implements option 2 by first doing a very fast non-natural sorting of the items according to their name. The resulting sequence of items is partially sorted even according to the "natural" comparison, and this makes it possible to do the final sorting with a far lower number of comparisons.
> 
> 
> Diffs
> -----
> 
>   dolphin/src/kitemviews/kfileitemmodel.h d005705 
>   dolphin/src/kitemviews/kfileitemmodel.cpp f21edbf 
> 
> Diff: http://git.reviewboard.kde.org/r/113485/diff/
> 
> 
> Testing
> -------
> 
> Loaded a folder with 100,000 items with the names "a1.txt", ..., "a100000.txt". I guess that this is also one of the worst cases with respect to "difference between natural and non-natural sort order".
> 
> This patch reduces the time needed to sort in a release build (commented out the profiling output in insertItems()) from 1940 ms to 714 ms, which is a saving of about 63%.
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20131031/a24613ee/attachment.htm>


More information about the kfm-devel mailing list