[Kstars-devel] Google SoC

Mon Mar 24 22:29:18 CET 2008

On Mon March 24 2008, Akarsh Simha wrote:
> The code that paints only stars in the field of view when zoomed in
> has already been implemented during the KDE4 port (i.e. spatial
> indexing). However, I don't think we load those stars dynamically
> into memory from the data files. This makes KStars use a whole lot of
> memory and makes the startup very slow. That's one of the objectives.
>
> Again, such dynamic loading, while it is implemented for the "global"
> limiting magnitude (set in the options dialog), could be implemented
> for the zoom-based magnitude limit.
>
> These should help KStars handle stars down to 12th magnitude.

I think you stated it well Akarsh.  There is one technical challenge 
that hasn't been solved yet regarding proper motion.  Stars slowly
move on the celestial sphere due to proper motion.  This means that
the spatial index of a few stars changes as time moves forward or
backward.  This proved to be a bit of a challenge even with all the
stars in memory. The problem is much worse when most stars are stored
on disk and hasn't been fully solved yet.

We currently have a layered approach for re-indexing stars.  These
lines of code from starcomponent.cpp initialize the reindexing system:

    m_highPMStars.append( new HighPMStarList( 840.0 ) );
    m_highPMStars.append( new HighPMStarList( 304.0 ) );
    m_reindexInterval = StarObject::reindexInterval( 304.0 );

When we select a portion of the sky to display, we add a small margin
of error (about one degree) extra so that stars that have drifted by
up to this amount will still get displayed even if all of the spatial
bin they are in is slightly off-screen. The code above causes two
extra lists of stars to be created, one with proper motion greater than
840 and a larger list of stars with proper motions less than 840 but
greater than 304.  You could easily create more lists by adding more
lines to the above code.  I'm confident more optimal choices of 
thresholds exist.

If you cd to the data/ subdirectory and run:

  $ ./sort-hip-by-pm.pl stars.dat 

You will get a list of all our stars sorted by proper motion on stdout.  
You can this list to figure out the sizes of the two lists above.

So the stars are segregated into three groups: really fast stars, medium 
fast stars, and then all the others.  Each group is updated pretty much 
independently.  Every time the time changes, a time threshold for each
group is checked to see if it is possible that one of the stars in that
group drifted out of the one degree safety margin.  If so, the entire
group is re-indexed. The fastest moving stars get re-indexed most often, 
the larger list of medium fast stars is updated less often and all of
the stars are re-indexed least frequently.

This algorithm will not work when most stars are on disk and not in 
memory!  We simply can not afford to ever re-index all of the stars.
I do not know of a simple algorithm that will work.  This is the
fundamental challenge.  There are many ideas that won't work and I
won't list them all here.  One possible solution would be to greatly
limit the earliest and latest dates for which KStars will display the
sky but I think this is entirely unacceptable.

I think the best hope lies in extending the ideas we already use one
step further.  The code I outlined above takes advantage of the fact
that very few stars have high proper motions so we can re-index all the 
very high PM stars without a performance hit.  The next step in this 
direction is the realization that when we re-index, there are usually 
only a very small percentage of stars that change index.  The vast 
majority keep the same index.

I haven't done any tests or run any simulations but I believe that
it would be possible to select a set of fixed times (over 1,000's of
years) and have lists on disk for all the stars that actually change
index for those times.  That is, we pre-compute all of the re-indexing
that needs to get done and save the deltas on disk.  If these lists
are small then the corrections should be fast.  For any time step or
zoom change or pan, we can display the vast majority of stars almost
instantly then a few stragglers may get filled in within a second or
two.

The second challenge is figuring out how to store all of this star
information on disk so it is easy to access.  The easiest solution
to implement would be to use a database backend and grab the data
with SQL calls.  Although we might want to make an SQL backend an
option the way Amarok does, we DO NOT want it to be a mandatory
dependency.  Also, SQLite is not going to work for us at all, at all.
Therefore we are going to need a clever flat file (or smallish library)
solution for storing all the star data on disk.  But I also think we
need to have all our stars in an SQL database for development purposes.
The simple flat file techniques such as sort-hip-by-pm.pl above will not
work (or will be very painful) when our data set is much, much larger.

Without the PM problem, the data on disk would need to be accessed by
two dimensions: spatial index and brightness.  One simple implementation
would be to have a file for each spatial index (currently 512, but that
could be changed trivially if another size was more optimal) with the
stars in each file ordered by brightness.  Unfortunately when we are 
zoomed out, this would require roughly 256 disk seeks (a few seconds)
to populate half the sky.  We could reduce this problem by storing
the brightest stars on a courser mesh or even in a single file like
we do now.  Then we need to figure out if we want to use redundant data
or not.  Should the finely indexed files contain the brightest stars as
well as the dimmest or should we only store each star in one place?  My
gut feeling tells me that a little redundancy will make the code easier
and faster and will only increase the storage requirements by less than
10%.  IMO this is a no-brainer in favor of redundancy.  We need to 
remember that the number of stars increases exponentially with 
magnitude.

The pre-compiled solution to the PM problem adds a third dimension, 
time, to the disk storage problem.  This may or may not be a problem
depending upon the number of stars that are in the re-index files.
For example, for every mesh area (currently there are 512) we could have
two extra files, one that tells us the stars that need to be added
as we move back in time and another that lists the stars that need to
be added as we move forward in time.  Unfortunately, unless we do
something clever, this simple solution would cause some high PM stars
to sometimes get displayed more than once.

I don't claim to have all the solutions.  The best solution may be
something I haven't even thought of.  But I hope that this gives you
an idea of the problems that need to be solved.

-- 
Peace, James