[Kst] Re: Benefit vs risk of constant-width columns + GUI proposal

Peter Kümmel syntheticpp at gmx.net
Tue Jan 25 11:36:26 CET 2011


On 25.01.2011 08:47, Brisset, Nicolas wrote:
> Hi Peter,
> 
>  
> 
> I have been thinking a bit about the optimization for constant-width columns, where all columns don't necessarily have the same width. I have also discussed it with a colleague, and I have some questions related to it:
> 
> - how much performance benefit does it bring? (the benefit)

It depends on the position of the row. For the 310 MB gyrodata file and column 3 the speedup
is from 4.0s to 2.5 s. The more right the column is the greater the speedup will be.


> 
> - what happens if the user looks at the first 10 lines of his file, decides each column has a constant width and in fact at line 100 the time which started at 0.0000 gets to 10.0000 (one more figure), thereby breaking the assumption? (the risk)

Then the data is not loaded correctly, when such option is enabled we must trust the user.
To fix this we could add an expensive function which analyzes the data and tells the user
what for optimizations could be used and sets them for the data.

> 
>  
> 
> The thing is, I still haven't found the definitive GUI approach to accommodate this option, and it can be dangerous. So we have to weigh the benefit against the risk, and maybe present it differently in the GUI.
> 
> At the moment, for the UI I'm tending to something as attached, but I'm not yet sure it is 100% foolproof. I do think it allows to accommodate all the formats we can reasonably support and to optimize them at the same time. Feedback is appreciated.

Yes, better than the current GUI. I would name "Free width" a bit different, maybe "Width of columns is not predictable"

> 
>  
> 
> On a related note, I think we should auto-detect the delimiter. It should be pretty easy as soon as the user tells us in which line to find data. But for that we'd need to connect the valueChanged() signal of the _startLine QSpinBox to a checkFormat() slot of the config widget, which would need to know the filename we're working on. But it seems that the filename for which we are configuring is not known in this code. Could we change that?

Auto-detecting the column delimiter is a nice idea (with all it's consequences).

> 
> The other option would be to go the whole way to more automation, and parse the first 10 to 20 lines to try and detect the format ourselves automatically. But I'd rather keep that for later; we've already spent so much time on ASCII for 2.0.3...

Postponing it and making a ticket sound good. We could collect there all the ideas, and maybe having
the new features out there are other new ideas.

Peter


More information about the Kst mailing list