Ideas for improving sorting. Goal: under 1 second for 200.000 files.

Fri Dec 28 13:46:17 GMT 2012

Hi,

2012/10/24 Christoph Feck:
> On Wednesday 24 October 2012 01:13:40 Christoph Feck wrote:
>> Speed of comparing the sequences is that of a memcpy()
>
> Of course I meant memcmp().
>
>> So what you have to work on before doing anything else, is to ask
>> Frank, if he is okey with adding a sort key (QByteArray basically)
>> for each file item. Your code would have to make sure it is always
>> updated on file renames, locale changes, sort mode changes
>> (natural vs. simple) etc.
>
> Another idea is to not store this side information "permanently", but
> only create it temporarily when sorting. So you effectively only sort
> a list with side information.
>
>         struct SortEntry
>         {
>                 QByteArray collatingKey;
>                 KFileItem *fileItem;
>         };

I have been thinking about sorting from time to time, and I had an
idea how to to store the sort key. It could be done either

a) In the ItemData's "values" hash (as a QVariant with a key like
"naturalCollatingSequence"), or,

b) in a new member "QByteArray naturalCollatingSequence" in ItemData
(if a QByteArray seems more suitable because QByteArrays can be
compared with < directly and the casting overhead for each comparison
seems to much).

When comparing two ItemData, we would then first check if both already
have a collating sequence, and calculate and store them if that is not
the case. One could then just compare those sequences.

When an item changes, the collating sequence would be deleted from the
hash (or set to an empty QByteArray, respectively).

This would make sure that the collating sequence is calculated only
once for each item, and only if it is really needed.

Best regards,
Frank