Ideas for improving sorting. Goal: under 1 second for 200.000 files.

Mark markg85 at gmail.com
Thu Oct 25 08:29:26 BST 2012


On Wed, Oct 24, 2012 at 1:13 AM, Christoph Feck <christoph at maxiom.de> wrote:
> Hi,
>
> you wrote:
>> Copy KStringHandler::naturalCompare into dolphin for further in
>> depth optimizations
>
> A good read about locale-aware sorting is this link from glibc doc:
>
> http://www.gnu.org/software/libc/manual/html_node/Collation-
> Functions.html
>
> Basically, it says to get an effective locale-aware sorting, you need
> to cache a sort key ("collating sequence") for each string, then use
> that for the sort algorithm. That sort key could additionally encode
> the digit sequences in "magnitude - digits" order, so that "a12" comes
> after "a2", and do case mapping for case insensitive compares.
>
> If you sort N = 200.000 entries, you only have to create 200.000
> collating sequences, instead of 2*N*log(N) = 7.000.000. Speed of
> comparing the sequences is that of a memcpy() if done properly. See
> Thiago's posts about QString performance optimizations.
>
> So what you have to work on before doing anything else, is to ask
> Frank, if he is okey with adding a sort key (QByteArray basically) for
> each file item. Your code would have to make sure it is always updated
> on file renames, locale changes, sort mode changes (natural vs.
> simple) etc.
>
> For more information about collating, I already gave you the link for
> the official Unicode collating code. You would need to improve the
> algorithm to include the "magnitude - digits" coding.
>
> Christoph Feck (kdepepo)
> KDE Quality Team

Hi Christoph,

Thanks a lot for your suggestion! However, there is a "bit" of a
knowledge problem here. I don't have the knowledge - by far - to even
implement something like you suggest or to even fully understand what
thiago wrote about regarding string optimizations. I like to read that
kind of stuff and i understand it partly, but implementing it is a
whole different story.

I'm getting there in knowledge, just give me a few more years :)

I guess i just keep it to the "simpler" optimizations for now. Unless
someone would like to help me implementing your suggestion?

Cheers,
Mark




More information about the kfm-devel mailing list