Ideas for improving sorting. Goal: under 1 second for 200.000 files.

Wed Oct 24 00:13:40 BST 2012

Hi,

you wrote:
> Copy KStringHandler::naturalCompare into dolphin for further in
> depth optimizations

A good read about locale-aware sorting is this link from glibc doc:

http://www.gnu.org/software/libc/manual/html_node/Collation-
Functions.html

Basically, it says to get an effective locale-aware sorting, you need 
to cache a sort key ("collating sequence") for each string, then use 
that for the sort algorithm. That sort key could additionally encode 
the digit sequences in "magnitude - digits" order, so that "a12" comes 
after "a2", and do case mapping for case insensitive compares.

If you sort N = 200.000 entries, you only have to create 200.000 
collating sequences, instead of 2*N*log(N) = 7.000.000. Speed of 
comparing the sequences is that of a memcpy() if done properly. See 
Thiago's posts about QString performance optimizations.

So what you have to work on before doing anything else, is to ask 
Frank, if he is okey with adding a sort key (QByteArray basically) for 
each file item. Your code would have to make sure it is always updated 
on file renames, locale changes, sort mode changes (natural vs. 
simple) etc.

For more information about collating, I already gave you the link for 
the official Unicode collating code. You would need to improve the 
algorithm to include the "magnitude - digits" coding.

Christoph Feck (kdepepo)
KDE Quality Team