Review Request 113485: Speed up natural sorting by first sorting the items according to KFileItem::text() with QString::operator<()

Frank Reininghaus frank78ac at googlemail.com
Tue Oct 29 10:41:22 GMT 2013



> On Oct. 28, 2013, 8:02 p.m., Mark Gaiser wrote:
> > Hi Frank,
> > 
> > That's one hell of a lot faster! It still doesn't beat..... <ok, i just skip it> ;)
> > 
> > I do have one suggestion for you that might make it even faster.
> > What you don't see doesn't have to be sorted, right? And to get that done, you might be able to use std::partial_sort [1]
> > 
> > Lets say this is your entire folder (in ascii)
> > 
> > 54
> > 2
> > 564
> > 32
> > 1
> > 678
> > 234
> > 23
> > 21
> > 866
> > 12
> > 8
> > 45
> > 
> > Lets say that this is your view:
> > 54
> > 2
> > 564
> > ------- top part of view
> > 32
> > 1
> > 678
> > 234
> > 23
> > 21
> > ------- bottom part of view
> > 866
> > 12
> > 8
> > 45
> > 
> > Now partial_sort is able(?) to do this (not tested so i could be wrong). Same view partial sorted.
> > ...
> > 54
> > 2
> > 564
> > ------- top part of view
> > ...
> > 8
> > 12
> > 21
> > 23
> > ...
> > ------- bottom part of view
> > 866
> > 45
> > ...
> > 
> > I know the above does work if you start from the top. I'm not 100% if it also works from the middle of a view. If it works then sorting can be reduced to the items you see. This would make the sorting insanely faster since all you have to sort would be about 100 entries at most, more aren't on the screen at any given time. Doing that would probably mean that you have to maintain which parts are sorted just to prevent needless resorting when the user scrolls up/down.
> > 
> > [1] http://en.cppreference.com/w/cpp/algorithm/partial_sort
> 
> Christoph Feck wrote:
>     > I know the above does work if you start from the top.
>     
>     No, think about it.
>     
>     The point of sorting is to have the view sorted, so there can be no unsorted "top" or "bottom" part of the view.
> 
> Christoph Feck wrote:
>     Sorry, the linked description was unclear. Reading http://www.cplusplus.com/reference/algorithm/partial_sort/ made it clear that it should work starting from the top.
> 
> Christoph Feck wrote:
>     Reading further, http://stackoverflow.com/questions/15395380/is-there-a-c-sharp-equivalent-to-c-stdpartial-sort (first reply) reveals an algorithm to get what you want.
> 
> Mark Gaiser wrote:
>     That really is some awesome help you provided there! :D
>     I bet that will also give Frank some nice new ideas to implement the stuff i said above. Thanks a lot for sharing that last link.

Thanks for your input. The question if we need to sort all items is not strictly related to this patch (which tries to make sorting a given list of items faster), but anyway: sometimes, I also think about ways how one can sort large data sets as fast as possible given a certain set of requirements, and I agree that using a combination of partial_sort and a "Quickselect"-based partitioning approach as described in the Stackoverflow page is an obvious way to sort the items in a small visible window without having to sort everything.

However, I think that getting this right is an *extremely* complex task:

1. Keep in mind that Quicksort/Quickselect has O(N^2) worst case behavior, and that it's extremely simple to trigger it (see https://bugreports.qt-project.org/browse/QTBUG-11022). So one has to add some code to detect this worst case and find a workaround for it.

2. Implementing the code that remembers which parts are sorted/partitioned already and which aren't, and that makes sure that new parts of the total list are sorted on-the-fly when scrolling though the view, is far from trivial.

3. When items are sorted on-the-fly, one also has to make sure that things like, e.g., the selection is updated correctly. It's easy to select items which are not visible at all (e.g., by pressing Ctrl+A, or Home and then Shift+End, or by using the DolphinPart's "select items matching pattern" feature). These have to be updated correctly when the invisible items are resorted.

I'm all for optimizations that make the performance better in many cases when they don't make the code much more complex, or if all new complexity is contained in a single new function or new class, but I think that turning large parts of the code base upside down and introducing lots of new code, which will most likely be very bug-prone, and very hard to maintain, is not a good idea when it only benefits the "let's load hundreds of thousands of files" use case.


- Frank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113485/#review42551
-----------------------------------------------------------


On Oct. 28, 2013, 6:46 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/113485/
> -----------------------------------------------------------
> 
> (Updated Oct. 28, 2013, 6:46 p.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Repository: kde-baseapps
> 
> 
> Description
> -------
> 
> We all know now that natural sorting can be very slow. There are two ways to improve this:
> 
> 1. Make the "natural comparison" of two items faster. We might be able to do this when we can use QCollator from Qt5.
> 
> 2. Reduce the number of (expensive) comparisons.
> 
> This patch implements option 2 by first doing a very fast non-natural sorting of the items according to their name. The resulting sequence of items is partially sorted even according to the "natural" comparison, and this makes it possible to do the final sorting with a far lower number of comparisons.
> 
> 
> Diffs
> -----
> 
>   dolphin/src/kitemviews/kfileitemmodel.h d005705 
>   dolphin/src/kitemviews/kfileitemmodel.cpp f21edbf 
> 
> Diff: http://git.reviewboard.kde.org/r/113485/diff/
> 
> 
> Testing
> -------
> 
> Loaded a folder with 100,000 items with the names "a1.txt", ..., "a100000.txt". I guess that this is also one of the worst cases with respect to "difference between natural and non-natural sort order".
> 
> This patch reduces the time needed to sort in a release build (commented out the profiling output in insertItems()) from 1940 ms to 714 ms, which is a saving of about 63%.
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20131029/3134822e/attachment.htm>


More information about the kfm-devel mailing list