[Kst] Re: AsciiSource: new defaults, Kst's atof

Tue Jan 25 17:51:40 CET 2011

On 24.01.2011 21:01, Barth Netterfield wrote:
> Great work.  Huge speedup!
> 
> On Mon, Jan 24, 2011 at 2:08 PM, Peter Kuemmel <syntheticpp at gmx.net> wrote:
> 
>> Attached the benchmark results with values for Linux.
>> It was a 310 MB gyrodata file, and I always have loaded
>> column three only.
>>
> 
> Can you explain what the various cases in the table are?  Are these all
> fixed width columns, or are they variable width?  Or did you show both
> cases?
> 

I tested with diffenerent column and comment delimiter:
No comment: the linedit for the comment delimiters is empty
#  comment: only # is given as comment delimiters
one space : custom column delimiter selected and one space is entered in the linedit
whitespace: 'Whitespace' was selected.

and variable column width width.

> 
>> I found that the atof function which we already use on Windows
>> by default is also faster on Linux. Therefore I think we should
>> also use it on Linux, especially our numbers aren't that
>> complicate to parse.
>>
> 
> Good.  Can it parse scientific notation?  What does it do about '.' vs ','
> (I haven't looked...)

'.' vs ',' is corretly handled, scientific notation we have to check.

> 
> Additionally we should change the default comment delimiter as
>> Nicolas already suggested. Then a normal user who often uses the
>> defaults settings would see a speed on Linux by factor 5 by
>> simply updating to Kst 2.0.3 on windows it is about factor 2-3.
>>
> 
> Is the proposal to have '#' as the default comment delimiter?  The speedup
> is from only having one?

Yes, having four slow it down:

internal update:
1 char -> 2.3s
4 char -> 12s

loading data:
1 char -> 3.6s
4 char -> 10.1s

For 4 charackter QString::contains is called when only one char is used
only a char==char is done., therefore the speedup.

But we could introduce optimized functions for the cases n < 5 if necessary.

> 
> 
>> But speedup is only for the pure data loading.
>> The internalDataSourceUpdate is still very slow, counting the
>> rows and looking for comments is now slower than reading the data!
>> This makes no sense so we should also optimize internalDataSourceUpdate
>> before we release 2.0.3.
>>
> 
> Yes.  It should be far faster.
> 
> 
>> Do we support comments which are anywhere in the data or is it
>> enough to only support complete lines as comments, lines which
>> starts with the comment delimiter?
>>
> 
> Well... we should probably support white space before a comment at the
> begining of a line, but a correctly formed ascii file will have the same
> number of columns for every line, so if there is a comment later in a line
> other than the first line it will either be after the last column, or will
> be a syntax error in the file.
> 
> So: check for comments characters anywhere in the first line when chosing
> the number of columns.
> After that, only check at the begining of the line (up to the first
> non-white space character).
> 
> 
>> Peter
>> --
>> Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
>> Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
>>
>> _______________________________________________
>> Kst mailing list
>> Kst at kde.org
>> https://mail.kde.org/mailman/listinfo/kst
>>
>>
> 
> 
> 
> 
> _______________________________________________
> Kst mailing list
> Kst at kde.org
> https://mail.kde.org/mailman/listinfo/kst