[Kstars-devel] Ideas for Adding a Spatial Index to KStars

Thu Jun 22 03:53:13 CEST 2006

I had written this before seeing Jason's post.  I decided to post this
as it is and compose a response to Jason's suggestion separately.

On Wednesday 21 June 2006 06:59, Thomas Kabelmann wrote: 
> > But that's okay, I think we can still implement your idea (2) above if we
> > are willing to keep the SkyIndex and SkyRegion classes separate.  Better
> > yet, we could get rid of the SkyRegion class entirely!  The idea here is
> > that the SkyIndex class would return lists of actual indices instead of
> > lists of SkyRegions.  Each SkyComponent would need to keep its own array
> > of "regions", which would just be an array of QLists.  I would imagine
> > that this array would be a static class member.
> Why should this be a static member? Every component should only know it's own 
> data. StarComponent should only know the stars, CometsComponent only the 
> comets and so on.

I meant a "static class" member.  Perhaps I'm using the terminology
incorrectly.  I thought it was standard C++ practice to keep a list
of all objects created by a class in a static class data member (if such
a list is required).  But don't worry, if you read on below, you will see
that I am willing to accept the component/composite model and move what I
suggested to be component static class methods and data into the composite
classes.

> Currently we have in most situations a QList for storing data. From memory 
> point of view there is no difference if you have 100 regions with 10 
> (component) lists (numbers are just samples) or you have 10 components with 
> 100 regions.
> 
> I can imagine that every component has a structure which contains an array 
> (QVector perhaps) which stores the regions in lists (QList). For an easy 
> implementation I suggest a templated class RegionList, so every component can 
> instantiate a list of the object type it needs:

I think you must either mean "every composite" or "every component class",
(like the static class methods and data in my second proposal) otherwise
I don't understand.

> template <typename T, int regions> class RegionList
> {
> 	public:
> 		// add the object in the right region list
> 		void add(T object, SkyIndex index);
> 		// return an iterator for the structure using the indices
> 		Iterator<T> iterator();
> 		...
> 	private:
> 		QVector <QList <T>> list(regions);
> 		skyIndex *indices;
> }
> 
> For traversing the RegionList we should implement an iterator. This iterator 
> needs to know, what type of objects are stored (templated class), the indices 
> of the regions and in special cases like the StarComponent the magnitude 
> level. So we can easily traverse in all components like this:
> 
> Iterator<StarObject> it = regionList.iterator();
> for (it.begin(); it.end(); it.nex())
> {
> 	// drawing code
> }

I think this is way too complicated and was already solved in a simpler
way in my previous response where the skyIndex returns a QList of integers
that contain the indices of the visible regions.  Every component/composite
has its own array of QLists.  I repeat this code below.  I think it is
a step in the wrong direction to create a whole new set of classes that
will merely let us replace two lines of code with one.  

> > I'll try to explain why I think we will no longer need the SkyComposites
> > when we use the index/region model in a separate message.
> I don't see the reason why we don't need composites anymore. The 
> composite/component modell allows us to delegate the messages. It's a 
> hierarchical organized structure, so we just need to know the top level 
> composite and send all messages to this object, without knowing the 
> underlying structure.
> 
> One simple optimization I see, is to update just the components which are 
> visible by option. If the component is disabled, the update() call can be 
> done in draw() using a dirty flag which the component stores. The update() 
> call just sets the dirty flag in this case.

I feel like we are close to converging here.  You have won me over to keeping
the component/composite model.  But I would dearly love to change those long
and fairly redundant names.  Perhaps XyzComponent and XyzComposite could
become either simply Xyz and XyzIndex, or a bit longer: XyzItem and XyzIndex.
Or, for constellation lines and the Milky Way, the components could become
ConstellationLinesPolygon and MilkyWayPolygon since is what they are. The
COMPonent and COMPosite names make my head swim a bit and requires me to
think twice every time I tab-complete a filename.  

I think it would be appropriate to change the class names one at a times
as we move components over to the indexing model

I agree that the component/composite (c/c) model simplifies message delegation
and this is a good thing.  My primary concern is the speed of drawing.  We seem
to agree that for each type of object to be drawn there needs to be basically
a single loop (like your iterator loop above) that loops over all of the things
of that type that need drawing.

My first inclination was to put these loops in static class methods in what
are now the component classes and make the data it needs in a single array
of SkyRegions.  I then modified this in response to your comments by putting
the data an array of QLists that was  a static data member in the component
classes.  Since from the code I've seen the only message getting passed down
via the c/c model was "draw", I figured the composite classes would no longer
be needed since they could be replaced by static class methods and data in the
component classes.

But the ideal solution would be fast draw loops within the c/c model.
I really don't think a third set of classes for each component type is
required for this.  I've already explicitly written out how the iteration
would work, it's just two lines instead of one and I don't think we need
yet another set of classes to wrap around this.  All we need to do is move
all of the things I proposed as static component class methods and data
into the composite classes, or as I would prefer to call them, the index
classes.

Here is the star drawing example once again.  This time the code and data
are in the StarIndex class.  The "regions" are still an array (or QList)
of QLists.  I am not certain of how to delcare this in C++.  If we knew
the number of regions at compile time it would be something like:

private:
    QList<StarObject *> region[NUMBER_OF_REGIONS];

The drawing and inserting code would be just like what I wrote out before,
but I repeat it here for completeness.  The XyzIndex classes would need to
be friends of the XyzItem classes so they can get to the data inside the
items directly.

Again, this code would now be in the StarIndex class, and StarIndex would
have a data member called skyIndex that is a pointer to the SkyIndex which
now returns integers and lists of integer indices instead of SkyRegions
(which are gone now).

    StarObject *o = new StarObject( ... );
    o->EquatorialToHorizontal( data()->lst(), data()->geo()->lat() );
    int index = skyIndex->indexPoint( o->ra(), o->dec() );
    region[index]->append( o );

The code for drawing the stars is the same as I proposed in my last message,
except it is now in the StarIndex class:

    QList<int> indices = skyIndex->getView();
    foreach ( int index, indices ) {
        foreach ( StarObject *curStar, region[index] ) {
            ...
        }
    }

My understanding is that the foreach() construct above is simply a C++
(or QT?) shorthand for explicit iteration.  I repeat the drawing loop here
with the iterations expanded out:

    QList<int> indices = skyIndex->getView();
    QList<int>::iterator index;
    for ( index = indices->first() ; index ; index = indices->next() ) {
        QList<StarObject *> thisRegion = region[*index];
        for ( StarObject *curStar = thisRegion->first(); curStar ; curStar = thisRegion->next() ) {
            ..
        }
    }

I really don't see what creating a new class to combine these nested iterators
into a single iterator will gain us.  It can't make the code any faster and
could slow it down.

Notes and Caveats:

I don't think my C++/QT syntax here is perfect. My main concern is the
data layout.

The code for iterating over extended objects that are in more than one
region is a little bit more complicated but doesn't change the overall
structure.  I suggest we continue discussing the simpler case of point
objects for now.

I really like your idea of only updating things that are visible on the
screen.  We will need to discuss this with Jason.  I'd like to postpone
discussing it further and talk about it when we talk about drawing
extended and moving objects because they have to deal with very similar
issues so perhaps we can come up with an integrated solution.

Although it is true that the memory footprint of 100 regions with 10 lists
each is similar to the footprint of 10 regions with 100 lists each, this
does not mean that the memory layout has no effect on the timing of loops.
The ability to fit all of the code and data used inside the inner loop 
into the CPU cache can have a dramatic effect on the speed of the loop.

Nonetheless, I think the scheme described here is a good solution since it
gives us the optimized draw loops together with the convenience of the c/c
model.  I'm not certain it will give us the speed of the 3.5 code but I'm
willing to give it a try.  

Some of our current component classes would become just data containers
with no code since the only code the contained, the draw() method, will
be moved to the index classes.  I think the "Index" suffix is an
appropriate naming convention for what are currently called the composite
classes since the instances of these classes will actually hold the spatial
index of their corresponding components.

The initialization of the XyZIndex classes will become more complex.  They
will need to be passed the skyIndex as in initialization parameter and they
will ask the skyIndex for the size of the arrays they need to create,
that I previously described schematically as:

private:
    QList<StarObject *> region[NUMBER_OF_REGIONS];

We currently have three different viable schemes for adding a spatial
index to KStars.  They differ primarily in where the index information
is stored and where the drawing routines are located.   All of the schemes
move the draw routines out of the component objects so we can have a single
loop over each type of thing to be drawn.

Scheme 1) SkyRegion index
  The index is stored in a single array of SkyRegion objects.  Each
  SkyRegion contains a QList (or other list) for every type of object
  drawn.  This scheme would probably require the least changes to the
  current code base.  The draw() methods would become static class
  methods in the component classes.  the composite classes go away.

Scheme 2) Static component class indices
  Each component stores its own index in a static class data member.
  The draw routines remain static class methods.  The composite classes
  go away.

Scheme 3) Composite instance (now index) indices
  The indices and draw methods move from the the component classes to the
  composite (now index) instances.  The benefit of the scheme is that we
  retain the message passing of the c/c model.

I must say I am a little concerned about all of the extra dereferencing
required in schemes (2) and (3).  I realize that processors have
instructions to do this quickly, my concern is that it is going to make
it impossible for the processors to cache our data.

This leads me to propose a fourth scheme where the index is returned to
an array of SkyRegions in the SkyIndex, but moves the draw routines to
the composites.  This would give us the message passing benefits of the
c/c model but give us the data coherency of the original scheme I proposed:

Scheme 4) SkyRegion Index, composite drawing

I'd be willing to start out with either scheme (3) or scheme (4).  They
are very similar and I cannot predict if there would be a substantial
speed difference between the two.  If there is a speed difference, I don't
know which one would be faster.  Both schemes retain the current c/c model
but I would dearly love to change the names of the files and the classes.

-- 
Peace, James