Review Request 108824: New sorting algorithm for Dolphin
Frank Reininghaus
frank78ac at googlemail.com
Thu Feb 14 18:12:30 GMT 2013
> On Feb. 8, 2013, 9:55 a.m., Frank Reininghaus wrote:
> > Thanks Emmanuel! This looks like it's been a massive amount of work (BTW, it might be good to discuss such ideas on the mailing list before starting to code). I've also played around with Timsort some time ago, but only with a quick hack of mine that just made use of already sorted runs and then with https://github.com/gfx/cpp-TimSort.
> >
> > I don't have the time now to go through everything, but here are two things that come to my mind right now:
> >
> > 1. Were you aware of https://github.com/gfx/cpp-TimSort? If yes, what is different about your implementation? The one on Github looks like it has a permissive license and could in principle be used, and it's not always a good idea to 'reinvent the wheel'. However, it looks like it's just a Github repository and has not seen any 'official' release yet, so we would have to just copy the header to our repository, which is not quite an elegant approach.
> >
> > 2. I think that such a thing should in principle not be implemented on the application level. Having easy access to a sort function that minimizes the number of comparisons and works well for data with some structure in it might also be useful in other contexts where comparisons are expensive. It would probably not be accepted into kdelibs right now due to the feature freeze, so adding it to Dolphin itself might be the way to go in the short term, but I think that it should be put into some kind of library in the long term. Maybe a new 'framework' that bundles a couple of sorting-related things like this and the parallel sorting implementation which are not available out-of-the box on every system which has a C++ compiler+Qt?
> >
> > 3. I agree that benchmarks and, more importantly, unit tests should definitely be added.
>
> Emmanuel Pescosta wrote:
> > and it's not always a good idea to 'reinvent the wheel', BTW, it might be good to discuss such ideas on the mailing list before starting to code
> Yes I know. But in this case I tried to learn how the Tim Sort algorithm works ;) (only possible when you implement/debug it by yourself with the help of http://svn.python.org/projects/python/trunk/Objects/listsort.txt)
>
> > 1. Were you aware of https://github.com/gfx/cpp-TimSort? If yes, what is different about your implementation?
> Yes. My final implementation is a mix between TimSort.java/cpp-TimSort and my original implementation (This implementation was by far not as fast/beautiful as cpp-TimSort)
>
> I have also replaced the normal insertion sort (Used by my first implementation) with binary insertion sort (used in TimSort.java)
>
> My final implementation is a little bit faster than cpp-TimSort and a little bit easier to understand I think (+ it uses Qt containers/algorithms)
>
> 2. Yep. A sorting algorithm framework would be great :)
>
> Frank Reininghaus wrote:
> Thanks for the explanation!
>
> What is the motivation for using Qt containers+algorithms? AFAIK, the only real advantage that Qt containers have compared to the STL ones is implicit sharing/copy-on-write [1], but I guess that this does not matter much here. Moreover, the Qt algorithms are mostly deprecated as far as I understand and are even replaced by STL algorithms in some places inside Qt [2].
>
> In any case, I think the next step should really be to implement tests+benchmarks. I think that I have some half-ready sort algorithm test code at home. I'll try to find it when I'm at home and share it with you, so we can check if it's useful as a starting point. When we have benchmarks, we can also answer the question if it makes any difference at all to use Qt/STL containers.
>
> [1] http://marcmutz.wordpress.com/effective-qt/containers/
> [2] http://www.kdab.com/kdab-contributions-to-qt-5-0-part-4/
>
> Mark Gaiser wrote:
> Emmanuel, this is very cool! I haven't really searched through your code, but it looks like you are still using the compare stuff that dolphin has. My piece of advice is to get rid of the natural compare. It is _very_ slow and slows sorting down quite massively. But it still has to do the natural compare! Just different and faster.
>
> Really cool stuff :) Frank, thanks for the cpp-timsort link. I didn't know about that one.
>
> What i would do here - though that also adds complexity - is running std::sort on the first sort and timsort when the data changes.
>
> And a small note for std::sort vs qSort (QtAlgorithms). Qt itself is moving away from their specific algorithm implementations. std::sort is better and faster then qSort.
>
> Frank Reininghaus wrote:
> Here is a bit of code that might or might not be useful for benchmarking:
>
> http://paste.kde.org/671546/
>
> It's by no means ready - at the moment, it only counts calls of the comparison function, doesn't do any time measurements. It's copied and pasted from various places in Qt, and there's a few hacks of mine in there as well.
>
> I don't know if it's useful, but I think we should at least test a few common cases, like 'sorted', 'reversed', 'random order', 'Quicksort worst case' (surprising how easy it is to trigger it, isn't it?).
> What i would do here - though that also adds complexity - is running std::sort on the first sort and timsort when the data changes.
Why would that make sense? std::sort (which is median-of-3 Quicksort unless Quicksort's worst case is detected, in which case it falls back to Heapsort) is very fast when sorting POD, like int, for which comparisons are extremely cheap. Other things matter more then. But in our case, the number of comparisons is what counts, and Timsort needs less comparisons than std::sort even in the 'random order' case.
Even if we manage to make the natural comparison of items faster, its cost will never be anywhere close to the cost of comparing ints. There's always going to be at least a couple of pointer dereferences involved.
- Frank
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/108824/#review26941
-----------------------------------------------------------
On Feb. 6, 2013, 11:53 p.m., Emmanuel Pescosta wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/108824/
> -----------------------------------------------------------
>
> (Updated Feb. 6, 2013, 11:53 p.m.)
>
>
> Review request for Dolphin and Frank Reininghaus.
>
>
> Description
> -------
>
> Implemented a new sorting algorithm called Tim Sort which is a modified Merge Sort algorithm (http://svn.python.org/projects/python/trunk/Objects/listsort.txt) - Default sorting algorithm in Python and Java
>
> Tim Sort is really fast for already sorted lists (Great for inserting new items, sort role changes - see benchmark 2, ...)
>
> Some benchmarks (200k folders and files):
> # Sort Type: Name => TimSort: 1755 ms - Parallel MergeSort: 1547 ms
> # Sort Type: Name (Reorder descending) => TimSort: 163 ms - Parallel MergeSort: 434 ms
> # Sort Type: Size => TimSort: 259 ms - Parallel MergeSort: 516 ms
> # Sort Type: Type => TimSort: 269 ms - Parallel MergeSort: 594 ms
>
> Btw.: Tim Sort uses only 1 core!
>
> ToDo: Write a benchmark for sorting algorithms
>
>
> Diffs
> -----
>
> dolphin/src/kitemviews/private/kfileitemmodelsortalgorithm.h 1d56894
>
> Diff: http://git.reviewboard.kde.org/r/108824/diff/
>
>
> Testing
> -------
>
> Sorting works fine for me
>
>
> Thanks,
>
> Emmanuel Pescosta
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20130214/33489119/attachment.htm>
More information about the kfm-devel
mailing list