Review Request 108824: New sorting algorithm for Dolphin

Mark Gaiser markg85 at gmail.com
Tue Feb 12 23:28:05 GMT 2013



> On Feb. 8, 2013, 9:55 a.m., Frank Reininghaus wrote:
> > Thanks Emmanuel! This looks like it's been a massive amount of work (BTW, it might be good to discuss such ideas on the mailing list before starting to code). I've also played around with Timsort some time ago, but only with a quick hack of mine that just made use of already sorted runs and then with https://github.com/gfx/cpp-TimSort.
> > 
> > I don't have the time now to go through everything, but here are two things that come to my mind right now:
> > 
> > 1. Were you aware of https://github.com/gfx/cpp-TimSort? If yes, what is different about your implementation? The one on Github looks like it has a permissive license and could in principle be used, and it's not always a good idea to 'reinvent the wheel'. However, it looks like it's just a Github repository and has not seen any 'official' release yet, so we would have to just copy the header to our repository, which is not quite an elegant approach.
> > 
> > 2. I think that such a thing should in principle not be implemented on the application level. Having easy access to a sort function that minimizes the number of comparisons and works well for data with some structure in it might also be useful in other contexts where comparisons are expensive. It would probably not be accepted into kdelibs right now due to the feature freeze, so adding it to Dolphin itself might be the way to go in the short term, but I think that it should be put into some kind of library in the long term. Maybe a new 'framework' that bundles a couple of sorting-related things like this and the parallel sorting implementation which are not available out-of-the box on every system which has a C++ compiler+Qt?
> > 
> > 3. I agree that benchmarks and, more importantly, unit tests should definitely be added.
> 
> Emmanuel Pescosta wrote:
>     > and it's not always a good idea to 'reinvent the wheel', BTW, it might be good to discuss such ideas on the mailing list before starting to code
>     Yes I know. But in this case I tried to learn how the Tim Sort algorithm works ;) (only possible when you implement/debug it by yourself with the help of http://svn.python.org/projects/python/trunk/Objects/listsort.txt)
>     
>     > 1. Were you aware of https://github.com/gfx/cpp-TimSort? If yes, what is different about your implementation?
>     Yes. My final implementation is a mix between TimSort.java/cpp-TimSort and my original implementation (This implementation was by far not as fast/beautiful as cpp-TimSort)
>     
>     I have also replaced the normal insertion sort (Used by my first implementation) with binary insertion sort (used in TimSort.java)
>     
>     My final implementation is a little bit faster than cpp-TimSort and a little bit easier to understand I think (+ it uses Qt containers/algorithms)
>     
>     2. Yep. A sorting algorithm framework would be great :)
> 
> Frank Reininghaus wrote:
>     Thanks for the explanation!
>     
>     What is the motivation for using Qt containers+algorithms? AFAIK, the only real advantage that Qt containers have compared to the STL ones is implicit sharing/copy-on-write [1], but I guess that this does not matter much here. Moreover, the Qt algorithms are mostly deprecated as far as I understand and are even replaced by STL algorithms in some places inside Qt [2].
>     
>     In any case, I think the next step should really be to implement tests+benchmarks. I think that I have some half-ready sort algorithm test code at home. I'll try to find it when I'm at home and share it with you, so we can check if it's useful as a starting point. When we have benchmarks, we can also answer the question if it makes any difference at all to use Qt/STL containers.
>     
>     [1] http://marcmutz.wordpress.com/effective-qt/containers/
>     [2] http://www.kdab.com/kdab-contributions-to-qt-5-0-part-4/

Emmanuel, this is very cool! I haven't really searched through your code, but it looks like you are still using the compare stuff that dolphin has. My piece of advice is to get rid of the natural compare. It is _very_ slow and slows sorting down quite massively. But it still has to do the natural compare! Just different and faster.

Really cool stuff :) Frank, thanks for the cpp-timsort link. I didn't know about that one.

What i would do here - though that also adds complexity - is running std::sort on the first sort and timsort when the data changes.

And a small note for std::sort vs qSort (QtAlgorithms). Qt itself is moving away from their specific algorithm implementations. std::sort is better and faster then qSort.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/108824/#review26941
-----------------------------------------------------------


On Feb. 6, 2013, 11:53 p.m., Emmanuel Pescosta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/108824/
> -----------------------------------------------------------
> 
> (Updated Feb. 6, 2013, 11:53 p.m.)
> 
> 
> Review request for Dolphin and Frank Reininghaus.
> 
> 
> Description
> -------
> 
> Implemented a new sorting algorithm called Tim Sort which is a modified Merge Sort algorithm (http://svn.python.org/projects/python/trunk/Objects/listsort.txt) - Default sorting algorithm in Python and Java
> 
> Tim Sort is really fast for already sorted lists (Great for inserting new items, sort role changes - see benchmark 2, ...)
> 
> Some benchmarks (200k folders and files):
> # Sort Type: Name => TimSort: 1755 ms - Parallel MergeSort: 1547 ms
> # Sort Type: Name (Reorder descending) => TimSort: 163 ms - Parallel MergeSort: 434 ms
> # Sort Type: Size => TimSort: 259 ms - Parallel MergeSort: 516 ms
> # Sort Type: Type => TimSort: 269 ms - Parallel MergeSort: 594 ms
> 
> Btw.: Tim Sort uses only 1 core!
> 
> ToDo: Write a benchmark for sorting algorithms
> 
> 
> Diffs
> -----
> 
>   dolphin/src/kitemviews/private/kfileitemmodelsortalgorithm.h 1d56894 
> 
> Diff: http://git.reviewboard.kde.org/r/108824/diff/
> 
> 
> Testing
> -------
> 
> Sorting works fine for me
> 
> 
> Thanks,
> 
> Emmanuel Pescosta
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20130212/8e638fa5/attachment.htm>


More information about the kfm-devel mailing list