Ideas for improving sorting. Goal: under 1 second for 200.000 files.

Mark markg85 at gmail.com
Thu Oct 25 23:02:32 BST 2012


On Thu, Oct 25, 2012 at 11:00 PM, Christoph Feck <christoph at maxiom.de> wrote:
> On Thursday 25 October 2012 20:24:23 Mark wrote:
>> On Thu, Oct 25, 2012 at 2:09 PM, Christoph Feck
> <christoph at maxiom.de> wrote:
>> >         'a' ->  'a'
>> >         'b' -> 'b'
>> >         'ä' -> 'a' '\0377'
>> >
>> > The appended 0xFF character ensures that 'ä' is always after 'a',
>> > regardless of which other character follows. But it will never be
>> > after 'b', because of the first 'a' in the sort key.
>>
>> This is where things get odd... So, i should place 'ä' before 'a',
>> but that means that i have to make a translation table somewhere
>> from latin chars to alphabet chars.. Or am i misunderstanding
>> something here?
>
> You are right, there need to be some tables, but as I wrote, you do
> not bother, because you can use strxform() from glibc for this (or
> libicu, which is what QCollator is going to use). Those libraries
> already load locale-specific tables (every wondered, why "glibc-
> locale" package is several hundred megabytes big? ;) Simply check the
> source code of QString::localeAwareCompare(), it's no black magic to
> call glibc.
>
>> So how about this stuff in the implementations side?
>> Am i right that i'm going to get 2 new parts:
>>
>> 1. A part for building up the collation sequence. That part is
>> outside the sorting algorithm.
>
> Yes. That's why Frank said this should be in kdelibs, right besides
> the "naturalCompare" function. Looking at the QCollator link for
> future Q5, we should model the function in kdelibs the same way, so
> that it simply can be "moved" to Qt in frameworks.
>
>> 2. The sorting algorithm becomes a LOT less code and simply does
>> string compares based on the collation sequences. So no more
>> difficult checks like are done now. Thus the sorting would indeed
>> speed up massively, right?
>
> Well, the comparing gets massively faster, but the actual sorting
> algorithm gets a bit more complicated. First, you have to create a
> temporary SortEntry list, create the keys, then sort the side-
> information, and after that, have to use that sorted list to do the
> actual model sorting. But those additional operations are only O(N),
> not O(N*log N), which is why we expect it to be very worth for large
> models.
>
>> For Todd and Christoph, thanks a lot for writing such long
>> informative replies! I really appreciate that and it really helps
>> me understand how to implement this. Though i obviously still have
>> questions ;)
>
> Wer nicht fragt, bleibt dumm ;)
>
> Christoph Feck (kdepepo)

I'm playing with this stuff right now. The comparison gets MASSIVELY
faster since it are actually int compares! (or numeric compares) No
string compares at all anymore.
Btw, that german line you wrote. I can read that one you know ;)

Cheers,
Mark




More information about the kfm-devel mailing list