[Kst] getData flexibility

Thu May 4 02:38:44 CEST 2006

Your changes are fine - iff it doesn't give us a hit in performance - but I'm 
not entirely sure they are critical....

I was thinking less ambitiously : in the comments below, I indicate what I 
though was happen...

On Wednesday 03 May 2006 14:50, Eli Fidler wrote:
> George and I have discussed this issue some more. I think that we need to
> change the internals of Kst, particularly RVector. I think that instead of
> containing a range of samples from frame 0 to n, we should be able to store
> a set of frame ranges.
>
> Some situations where this is useful:
> 1. The cache contains frames 10-20 and 100-500 of a field. The server is
> unreachable. Kst can use the stored frames from the cache and add the other
> frames once the server is reachable. Without frame ranges, I would have to
> either say the datasource is empty, allow the user to use the first range
> only or allow the user to retrieve all the frames but 0-9 and 21-99 will be
> filled with NaNs. None of these solutions are very good.

The users can ask for whatever they want.  If the requested samples are:
	in the cache: 
		return the data
	not in cache, but the remote source is availible:
		download to cache, then return the data
	not in cache, and the system is offline: 
		silently return NaN of all non-cached samples

The cache demon would have to figure out if the remote source is availible....

To get frame 10, you have to ask for Frame 10, not frame 0, even if frame 10 
was the first sample you had in the cache.

I had not imagined that you could load data that later became availible, 
except for data at the end (where NF increases).  If the remote data source 
again becomes availible, then, the user would just have to hit reload to grab 
data that was previously returned as NaN.

> 2. Progressive loading. The user could retrieve frames 0-end skip 10, then
> 1-end skip 10, then 2-end skip 10... The RVector would know which frames
> are missing and still need to be retrieved.

For disk-bound data acess, this would be *way* slower than a straight read. 

I don't consider progressive load to be an important feature (though nice), 
but if we do implement it, then I am happy with:
	do a skip read of 100 samples per vector (max)
	display the curves
	do a full read
	redisplay everything

 There would have to be some heuristic to decide if the loading was going to 
be slow enough that the 2 step progressive read would be worth while.

> 3. I think frame ranges is a better representation of holes in the data
> than NaNs. It could be useful to visualize missing frame ranges by shading
> in the gui. Frame ranges would work for masking as well.

No promise that INDEX is on the X axis, so there is no general way of shading.

As to using frame ranges rather than masks...  In the common case this could 
actually be faster. (The NaN-checks could come out of the plot code, and go 
into the vector update or something like that).  

BUT: the big problem is that NaNs are sample by sample, not frame by frame, 
so... the lists would have to be by sample not by frame.  Remember that 
frames are a datasource/RVectors idea.  Classes that use vectors (including 
RVectors) in kst don't know a thing about frames - they only know about 
samples.  So the vectors would want to keep sample lists, not frame lists.

> 4. We could remove the hack in the piolib datasource which moves all the
> vectors to frame 0. I don't know exactly what the dirfile datasource does
> with data that doesn't begin at frame 0 as I don't have such a dirfile, but
> a similar hack likely exists.

getdata takes care of offset frame 0 internally - a line in the format file 
says that the first sample of data in the file is at frame XXXX.  Asking for 
frame XXXX then returns the first frame in the file.

> I have a class in NAD which stores sets of frame ranges (QPair<uint32_t,
> uint32_t>'s actually). It allows you to add ranges, intersect ranges, skip,
> etc. I think it could be used in Kst with some modifications. The header
> file is attached.
>
> I'm interested in people's thoughts on this idea.
>
> Eli
>
> On Tuesday 02 May 2006 18:08, Eli Fidler wrote:
> > I'm in the process of writing the new NAD KstDataSource with uses the
> > local cache. I've run into a case of getData() where the current
> > interface seems insufficient.
> >
> > When using the NAD cache, I may have frames 1 to 50 of a particular field
> > in the cache. If the NAD server connection is unavailable, what should I
> > do if the user requests frames 1 to 100? What about 1 to 50?
> >
> > I assume I should just give them the data in the second case and pretend
> > everything is fine. 

Everything is fine, so yes.

> > In the first case, should I return partial data? no 
> > data? something else?

Return it all, with NANs where there is no data.

> > What if I have part of the data, but it's not the first part?

If there data is there, return it, if not, get it.  If you can't, give NaN.

> > The problem stems from the fact that I can only return -1 for any error,
> > so I can't signal a partial success very easily.

Its not really an error (?).

> > Eli
> > _______________________________________________
> > Kst mailing list
> > Kst at kde.org
> > https://mail.kde.org/mailman/listinfo/kst