[Kst] [Bug 124942] New: multi-file datawizard for datafile comparison

Wed Apr 5 00:37:07 CEST 2006

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

http://bugs.kde.org/show_bug.cgi?id=124942         
           Summary: multi-file datawizard for datafile comparison
           Product: kst
           Version: unspecified
          Platform: unspecified
        OS/Version: Linux
            Status: NEW
          Severity: wishlist
          Priority: NOR
         Component: general
        AssignedTo: kst kde org
        ReportedBy: nicolas.brisset eurocopter com

Version:           1.3.0_devel (using KDE 3.4.2, Mandrake Linux Cooker i586 - Cooker)
Compiler:          Target: i586-mandriva-linux-gnu
OS:                Linux (i686) release 2.6.12-12mdk

After being subscribed to the kst mailing-list for a few months, I have the feeling that the typical use cases of the original kst users sometimes differ significantly from ours. One of the things we do the most frequently is compare two (or more) files corresponding to different configurations to track differences. Typically, the two data files will contain the same list of fields and during a session, the user will:
1) load a bunch of variables from conf1.dat with the datawizard
2) call Tools->Change data file with the "duplicate" option to create the corresponding curves from conf2.dat
3) notice discrepancies somewhere that need to be explained by looking at more vars
4) load new vars with the datawizard. *But* that's where the workflow breaks down: the wizard will only allow to load curves from the last file it was used with (and pay attention to the X vector that's going to be reused: it should be from the right file!), so that you will soon find yourself toggling back and forth between files, invoking the wizard an incredible number of times... until you can't stand it, and switch back to using gaiw :-(

But I have good news: that's not desperate :-) I have thought about this for a while, and I think I have come up with a very user-friendly way of improving this situation with a minimal complexification of the data wizard (in terms of user interaction, but hopefully coding it will not require too many changes either !). So, here we go for the changes I'd like to see:
a) add to the first page of the wizard a listbox with checkable items, one entry for each cached datasource instance. When a datasource is chosen in the KFileDialog it gets added to the list. The type and configure buttons would easier be duplicated or apply to the selected listbox item (listbox in single selection mode)
b) when the user checks more than one datasource, it means that subsequently created curves should be created for all checked datasources
c) the list of available vars is now the intersection of checked datasources field lists (and the "position" column can be removed if more than one source is checked)
d) for choosing X vectors, the same concept can be applied: the dropdown list could contain only vectors existing in both datasources (say, a "Time" field). Note that this will require a small change from the current implementation (see below for the gory details)
e) the last page can be kept as is, the only difference being that instead of creating one curve for each Y field selected by the user, the wizard will create as many as there are checked datasources.

I hope that's somewhat clear ??? Because the workflow in point 4) above would be great: just check the 2 (or more) datasources you want to work this, and step through the rest as usual : at the end, you get the curves superimposed just like you wanted to :-)

Now, I think I need to elaborate a bit on point d). Interestingly, that reflexion exposes a small issue in the current implementation. It would be better to provide only one list (vectors available in the datasource(s)), and a "reuse existing" checkbox (*not* a dropdown list) to try and reuse that vector, if it is already instantiated, regardless of its name. (Is that clear ? The issue here is that vectors in kst can't have duplicate names, while existing fields in two comparable datasources will very likely bear exactly the same names). To give an example, iy you have TIME and TIME' loaded from the "TIME" field in respectively conf1.dat and conf2.dat, in the current implementation you are trapped if you want to reuse the TIME field for X: you can select only TIME or TIME', but there is no way to instruct the wizard to create the new curves as [selected Y fields] = f(TIME from the same datasource). It would be better to look in current kst vectors for an instance of "TIME" from the given datasource. In other words, the current implementation does not scale to more than one datasource... But maybe I should open a separate report for that as it is getting complicated ?