[Kst] Re: AsciiSource: new defaults, Kst's atof
Peter Kümmel
syntheticpp at gmx.net
Tue Jan 25 17:51:40 CET 2011
On 24.01.2011 21:01, Barth Netterfield wrote:
> Great work. Huge speedup!
>
> On Mon, Jan 24, 2011 at 2:08 PM, Peter Kuemmel <syntheticpp at gmx.net> wrote:
>
>> Attached the benchmark results with values for Linux.
>> It was a 310 MB gyrodata file, and I always have loaded
>> column three only.
>>
>
> Can you explain what the various cases in the table are? Are these all
> fixed width columns, or are they variable width? Or did you show both
> cases?
>
I tested with diffenerent column and comment delimiter:
No comment: the linedit for the comment delimiters is empty
# comment: only # is given as comment delimiters
one space : custom column delimiter selected and one space is entered in the linedit
whitespace: 'Whitespace' was selected.
and variable column width width.
>
>> I found that the atof function which we already use on Windows
>> by default is also faster on Linux. Therefore I think we should
>> also use it on Linux, especially our numbers aren't that
>> complicate to parse.
>>
>
> Good. Can it parse scientific notation? What does it do about '.' vs ','
> (I haven't looked...)
'.' vs ',' is corretly handled, scientific notation we have to check.
>
> Additionally we should change the default comment delimiter as
>> Nicolas already suggested. Then a normal user who often uses the
>> defaults settings would see a speed on Linux by factor 5 by
>> simply updating to Kst 2.0.3 on windows it is about factor 2-3.
>>
>
> Is the proposal to have '#' as the default comment delimiter? The speedup
> is from only having one?
Yes, having four slow it down:
internal update:
1 char -> 2.3s
4 char -> 12s
loading data:
1 char -> 3.6s
4 char -> 10.1s
For 4 charackter QString::contains is called when only one char is used
only a char==char is done., therefore the speedup.
But we could introduce optimized functions for the cases n < 5 if necessary.
>
>
>> But speedup is only for the pure data loading.
>> The internalDataSourceUpdate is still very slow, counting the
>> rows and looking for comments is now slower than reading the data!
>> This makes no sense so we should also optimize internalDataSourceUpdate
>> before we release 2.0.3.
>>
>
> Yes. It should be far faster.
>
>
>> Do we support comments which are anywhere in the data or is it
>> enough to only support complete lines as comments, lines which
>> starts with the comment delimiter?
>>
>
> Well... we should probably support white space before a comment at the
> begining of a line, but a correctly formed ascii file will have the same
> number of columns for every line, so if there is a comment later in a line
> other than the first line it will either be after the last column, or will
> be a syntax error in the file.
>
> So: check for comments characters anywhere in the first line when chosing
> the number of columns.
> After that, only check at the begining of the line (up to the first
> non-white space character).
>
>
>> Peter
>> --
>> Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
>> Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
>>
>> _______________________________________________
>> Kst mailing list
>> Kst at kde.org
>> https://mail.kde.org/mailman/listinfo/kst
>>
>>
>
>
>
>
> _______________________________________________
> Kst mailing list
> Kst at kde.org
> https://mail.kde.org/mailman/listinfo/kst
More information about the Kst
mailing list