Ideas for improving sorting. Goal: under 1 second for 200.000 files.

Frank Reininghaus frank78ac at googlemail.com
Thu Oct 25 10:41:20 BST 2012


Hi Mark,

2012/10/25 Mark:
> On Wed, Oct 24, 2012 at 1:13 AM, Christoph Feck wrote:
>> Hi,
>>
>> you wrote:
>>> Copy KStringHandler::naturalCompare into dolphin for further in
>>> depth optimizations
>>
>> A good read about locale-aware sorting is this link from glibc doc:
>>
>> http://www.gnu.org/software/libc/manual/html_node/Collation-
>> Functions.html
>>
>> Basically, it says to get an effective locale-aware sorting, you need
>> to cache a sort key ("collating sequence") for each string, then use
>> that for the sort algorithm. That sort key could additionally encode
>> the digit sequences in "magnitude - digits" order, so that "a12" comes
>> after "a2", and do case mapping for case insensitive compares.
>>
>> If you sort N = 200.000 entries, you only have to create 200.000
>> collating sequences, instead of 2*N*log(N) = 7.000.000. Speed of
>> comparing the sequences is that of a memcpy() if done properly. See
>> Thiago's posts about QString performance optimizations.
>>
>> So what you have to work on before doing anything else, is to ask
>> Frank, if he is okey with adding a sort key (QByteArray basically) for
>> each file item. Your code would have to make sure it is always updated
>> on file renames, locale changes, sort mode changes (natural vs.
>> simple) etc.
>>
>> For more information about collating, I already gave you the link for
>> the official Unicode collating code. You would need to improve the
>> algorithm to include the "magnitude - digits" coding.
>>
>> Christoph Feck (kdepepo)
>> KDE Quality Team
>
> Hi Christoph,
>
> Thanks a lot for your suggestion! However, there is a "bit" of a
> knowledge problem here. I don't have the knowledge - by far - to even
> implement something like you suggest or to even fully understand what
> thiago wrote about regarding string optimizations. I like to read that
> kind of stuff and i understand it partly, but implementing it is a
> whole different story.
>
> I'm getting there in knowledge, just give me a few more years :)
>
> I guess i just keep it to the "simpler" optimizations for now. Unless
> someone would like to help me implementing your suggestion?

But which 'simple' optimizations that could yield a considerable
performance improvement remain? Christoph's suggestion is essentially
what you call 2.1 and 2.2.

I like the idea, and I'm willing to review any patches, of course, but
I'm afraid I'm too busy with other things to work on the code myself
:-(

Best regards,
Frank




More information about the kfm-devel mailing list