[Kstars-devel] strategies for expanding the size of catalogs

Tue Oct 16 21:15:56 CEST 2007

On Tue October 16 2007, Mike Rosseel wrote:
> > I don't think any of the dimmer stars will have names so fixed
> > fields should not be a problem.  We can easily create indexes for
> > each file telling us how far down certain magnitudes are.  I've
> > already done
>
> maybe that's what you meant already, but if the indexes are made at
> compile time there's no need for fixed field lengths: plenty of time
> to crawl through the files and create the index.

Good point.  But I'm currently thinking that the bytes on the disk
should simply be a copy of the bytes in an array of stars in RAM.
For this to work, all of the data for a star would have to be immediate,
no pointers allowed.  We might want to have two different star classes,
one for named stars and another for unnamed stars.  Unnamed stars 
already vastly outnumber named stars in our current data.  If we expand
the number of stars by a factor of 100 then the named stars will really
be a very special case.  So let's just ignore them for the discussion
below.

My current thinking is to have a simple general purpose container 
class/struct for holding an array of (unnamed) stars something like:

class StarBlock {
   int numStars;
   int blockID;
   Star[STAR_BLOCK_SIZE] stars;
}

STAR_BLOCK_SIZE would be around 100 or 500, or maybe 1000.  We
would have a pool of StarBlocks allocated in RAM at start up.
The number of blocks in this pool is set by the user with a slider
in the config window.  We don't interact with the disk in units
smaller than a StarBlock.  When a user pans and zooms, we simply
overwrite data in some StarBlocks and something on the next level
up keeps track of which data is in which StarBlock.

Reading data from the disk is a simple non-blocking, non-buffered
read() system call (`man 2 read` for details), filling a StarBlock
directly from disk.  We can check the magnitude of the last star
in the block to see if another block needs to be read in.  If so,
we read the next block in with very little overhead (one mag check
instead of 500 or 1000) and if the file on disk is contiguous, no
seeking will be required.  The numStars and blockID fields don't
need to be stored on disk.  If the files are just large arrays of
stars then we can tune STAR_BLOCK_SIZE at compile time or even
runtime (instead of file creation time).

IMO, this scheme will optimize our disk interactions which normally
are a big bottle neck.  I still haven't figured out the best way to deal
with proper motion but maybe the ideas above will provide a framework
within which to think about it.   Delta files for different epochs
is my best guess so far.

I think we should avoid using MySQL (or any other database) if at all
possible (and I think it is possible).  First of all, it is one heck
of a dependency.  But more importantly, Jan's previous comment reminded
me that for MySQL to work properly, the table indexes need to be in
RAM.  With the correct file layout, those indexes are not required so
they are just wasting RAM that could better serve us by holding more
stars.

-- 
Peace, James

Don't let one cloud obliterate the whole sky. 
-- Anais Nin