Ideas for improving sorting. Goal: under 1 second for 200.000 files.

Thu Jan 3 19:19:40 GMT 2013

On Thu, Jan 3, 2013 at 7:49 PM, Frank Reininghaus
<frank78ac at googlemail.com> wrote:
> Hi Mark,
>
> 2013/1/3 Mark:
> [...]
>>>> Don't bother thinking about this issue anymore. My brother (a far
>>>> better programmer then me) managed to make a extremely fast natural
>>>> sorting algorithm. 1 second for 1 million items. If you include init
>>>> stuff then it's 1.8 seconds. For 200.000 items it's even far less!
>>>> That's just on a cheap fusion pc.
>>>>
>>>> I will blog about that sometime this week or next week. Depends a bit
>>>> on my experiments with a KDirModel rewrite :)
>>>
>>> Sounds interesting. I'm looking forward to seeing the code.
>>>
>>> I've recently done some experiments with picking up some ideas from
>>> Python's Timsort, which I might also blog about soon. That greatly
>>> reduces the number of comparisons if there is some structure in the
>>> data, which happens, e.g., when resorting the items after reversing
>>> the sort order. In this and some other "best cases", only N
>>> comparisons are needed for N items. But for random data, we're still
>>> at O(N*log(N)), of course. In that sense, it's orthogonal to your
>>> approach of speeding up the comparison function and might also be
>>> interesting for other apps because real world data are often not in
>>> random order.
>>>
>>> Best regards,
>>> Frank
>>
>> Hi Frank,
>>
>> I will ask my brother if i'm allowed to share his code or if he
>> managed to get in some more cycle optimizations ;)
>
> Note that taking great efforts to save a few more CPU cycles should
> not be the main goal. What's more important is that the code is easy
> to read and maintainable.

Well, that's just fun :)
>
> And most importantly, it should not cause any regressions, i.e., it
> must be locale-aware, it must pass the natural compare unit tests in
> kdelibs, and ideally, one would test it on a large number of file
> names (like all files on your hard drive) and verify that it sorts
> them like KStringHandler::naturalCompare does. If you're planning to
> write on your blog that there is a possibility that your and your
> brother's code might be included in KDE soon, it might make sense to
> ensure that these things work OK.

I'm not counting on it to be fit for kde - yet. For that to happen it
must be refined somewhat to accommodate the KDE needs. If i blog about
this then i certainly will mention that rather important part.
>
>> Anyway, the sorting that i have now is internally (also) converted to
>> aligned integers so i think it's fairly safe to look at this
>> comparison: http://gamasutra.com/view/news/172542/Indepth_Smoothsort_vs_Timsort.php#.UOWJc2_K6uI
>> Timsort isn't bad, but quicksort seems to win there.
>>
>> Timsort only seems to be better if the data is mostly sorted already.
>> Also a note, we tried this using plain C++ and std::sort. While it's
>> speed is truly massive, it is actually twice as fast on a dual core
>> machine when the gcc parallel mode is enabled:
>> http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html It's
>> experimental, but i'm seriously thinking about using that for my
>> sorting. It just gives you parallel sorting without doing anything
>> special. All you need is compile against openmp and include the
>> parallel/algorithm header instead of the usual algorithm include.
>> These things might be something you could consider for Dolphin?
>
> What exactly is the benefit compared to a Qt-based parallel sorting
> implementation? I mean, I use OpenMP myself for my scientific work,
> but KDE does currently not have any OpenMP dependency AFAIK. Moreover,
> relying on an experimental feature of GCC and the standard library
> shipped with it might not exactly improve portability.

A few things:
1. completely 0 added code changes.
2. no added complexity due to above.
3. very easy to implement

at least 1 and 2 are not the case when you use Qt concurrency for the
same. 3 really depends on what you do.

But you're right, i don't think it's very wise to rely on experimental
features. However, if it can drastically speed things up then it
certainly should become something to think about. OpenMP is an open
framework. It is the way forward for the manycore cpu's that are
hot(pun intended) nowadays. If KDE doesn't depend on it yet then it
probably will within a few years. There is no escaping from that.

The best solution here would be if Qt where to implement a parallel
version of the algorithms they support.

Also, a framework like "Thrust" might also be a dependency sooner or
later. It goes one step further. It supports threaded algorithms (like
sort), but also on the GPU  though it's kinda limited to CUDA, TBB and
OpenMP at the moment. CUDA is obviously due to it being a nvidia
library (apache 2.0 license). But something like that only for OpenCL
should certainly be considered at some point in the not too distant
future.

>
> Best regards,
> Frank

Best regards,
Mark