[Kstars-devel] Binary star data loading accomplished in Branch!

James Bowlin bowlin at mindspring.com
Wed Jun 4 02:38:15 CEST 2008


On Tue June 3 2008, Akarsh Simha wrote:
> + Storing a pointer to the spectral type string in SkyObject::SpType,
>   instead of the string itself.

This is a very minor point but I don't understand why you did this.
Wouldn't it be better (faster, smaller) to store the string itself
instead of a pointer?  

On Tue June 3 2008, Akarsh Simha wrote:
> Also replacing call to skyMesh::indexStar(...) by a simple method of
> obtaining the trixel ID. [Works only for level 3 HTM]

That's fine that it only works with one mesh size.  If we want to work 
with other mesh sizes, we will have to supply other data files.   

> This seems to have brought down the time to load each star by a
> factor of 6!! On my system, it now takes about 225 seconds on an
> average to load 41560 stars, as against ~1745 seconds earlier. The
> timing code is currently left as it is, for testing this out.

Great!!  I think you meant milliseconds instead of seconds.  If I did
the arithmetic right, you've got the time down to 4.5 usecs/star.  This
is wonderful.  This is roughly the same time it will take to do an 
update() according to Jason's earlier measurements.  This means we 
might be able to fully load a dynamic star in 10 usecs which meets our
design goal.  Good work!

> Some of the newly implemented code could be dangerous, as it uses
> pointers at a very low level. This code is almost surely going to
> result in a segmentation fault when we implement storage and
> retrieval of observing log data and user-added links!

H'mm.  I see nothing wrong with using pointers at a very low level.
IMO, this is a good thing.  It is true that if we screw up, we could
cause segfaults but we just need to be a little careful.  

On Tue June 3 2008, Akarsh Simha wrote:
> > a perfect solution for dealing with them.  I think we may end up
> > having to have the named stars kept in their own little list that
> > gets drawn separately from the unnamed stars.   This may cause a
> > much larger problem with nearly duplicated code, which might be
> > "solved" by replacing the StarComponent class with
> > PlainStarComponent and NamedStarComponent classes.
>
> Why will this be required? The if(...) statement in the
> StarComponent::readData() seems to be doing a good job, atleast for
> now.

Fine.  I thought you were the one who was worried about it.  I'm all
for using the if(...).   The reason we might want two StarComponent
classes (in the future) is for the draw() routines.  I don't now see
how we can get away from having at least 2 draw() routines for the 
stars.

> > One problem we face is that my idea of promoting dim named stars to
> > the global stars group was a bad idea because it breaks the simple
> > idea of ending the draw loop as soon as we get a star dimmer than a
> > threshold magnitude.
>
> How will it break that? We aren't sorting the list of names by
> magnitude, but we are sorting them first by HTM index and then by
> magnitude.

As we were recently reminded, the list of stars for any one trixel
must be sorted by magnitude.  If the global stars are all brighter than
the dynamic per-trixel stars then everything is fine.  The problem 
occurs when we "promote" a dim named star to the global star file.  
This breaks the strict ordering by magnitude.  If ALL named stars will 
naturally land in the global star file (because they are bright enough) 
then this issue does not exist.  Otherwise, I think the simplest 
solution is to keep the named stars in their own separate list.

> In any case, the number of dim stars with names is definitely going
> to be very small.
>
> > We also face a difficulty that my simple idea of duplicating some
> > high PM stars to account for proper motion probably won't work for
> > named stars. Maybe it would be okay but I forsee possible
> > difficulties.
>
> We could do the good old reindexing on global stars. It shouldn't
> take much. If we really want to save on that, then we should probably
> be putting HD numbers or (if they don't exist) some other unique
> named ID in our star name files.

I don't think adding the HD number will help solve this problem, but
I think it is a good idea to have the HD number in the starname file 
anyway for other reasons (error detection).

> We still have quite a few bits left over in our starData struct, both
> in the flags and unused bytes. We could, if required, have an
> additional bit indicating how the HD number should be interpreted.
> For stars with HD numbers, we could keep that bit off. For stars
> without HD numbers, we fill some out-of-range UID into the HD number
> field and turn the bit on. That way, we will be able to provide a UID
> for each named star, without causing any damage to the starData
> structure.
>
> But we could probably keep that change and the decision for later.
>
> > Right now, I'm leaning toward keeping the named stars in their own
> > list (StarIndex structure) that gets drawn separately from the rest
> > of the stars.  The named stars would get treated the way we are
> > currently treating all the stars.  They would even have
> > HighPMStarLists and if necessary, we could re-index all the named
> > stars.
>
> Why would this be necessary, again?

Proper motion.  For the vast number of unnamed stars, we are planning on
simply duplicating a few high PM stars to deal with proper motion.  I am
worried that duplicating named stars may cause unwanted interactions 
with the rest of KStars.  If duplicating named stars will not cause any 
problems with KStars then we can treat all the stars the same and 
things are easy and we don't have to worry about any of what I'm about 
to say below.

> > global stars) if this makes the initialization code simpler for
> > you.
> >
> > For example, one way to move forward is to leave your existing
> > binary initialization code more or less as it is but only use it
> > for named stars.  Then start afresh with code for loading the
> > (unnamed) global stars into StarBlocks.  If you would prefer to
> > keep the named stars in same file as the unnamed global stars, that
> > is fine with me too.
>
> Would it be required to move from a constructor to a memcpy() even
> for named stars? I don't think it will be worth the effort. (We will
> have to make changes to SkyObject and make QString name etc into
> QString *name etc, so that they can be malloced when required)

No.  I'm saying let's use constructors for named stars and only use the
memcpy() trick for unnamed stars.

If duplicating named stars causes a problem for KStars then here is my 
solution.  We keep the named stars in their own list (StarIndex data 
structure).  We use the current HighPMStarList code (and possibly use
the current re-indexing code) to deal with proper motion of the named 
stars by actually moving them from trixel to trixel as the time changes 
(like we are doing now).

For all the unnamed stars, (both global and dynamic), we deal with 
proper motion by duplicating stars so they appear in more than one 
trixel.

Since the named stars are going to be in a different data structure than 
the unnamed stars, we will probably need to have two different draw() 
loops, one that goes over all the named stars and another loop that 
goes over all of the unnamed stars.  This actually makes sense 
architecturally because the named stars are different from the unnamed 
stars and in general, we have a different draw loop for each different 
kind of thing.

It would be possible to implement a very similar solution (which I was
advocating for before) where we put the named and global stars in the
existing data structure and use the new StarBlock structure only for the
dynamically loaded stars.  This solution is slightly inferior.  First, 
on general principles, it makes more sense to separate the named stars
from the unnamed stars instead of lumping the named stars with the 
unnamed global stars.  Second, the re-indexing and the HighPMStarList
mechanism are both expensive (they both requiring using the 
HTMesh::index() function, which you've seen can slow things down).  So
I think it is a bit of waste to re-index the global unnamed stars when
we don't have to.

My earlier plan of splitting the stars into two lists:

   1) All named and global stars
   2) All unnamed dynamic stars

will work but is slightly less efficient than splitting it as:

   1) All named stars 
   2) All unnamed stars (both global and dynamic)

This new scheme is possible if we allow for variable sized StarBlocks.  
I had previously been thinking of fixed size StarBlocks which would 
have made combining the global stars with the dynamic stars slightly
wasteful.

I'm thinking ahead on how to make the drawing and re-indexing as 
efficient as possible, even if it causes slight inefficiencies in
the loading/initialization.

If you agree with all of the above then I think the next step is to 
implement the two lists.  The named stars get created with a 
constructor (or whatever) and go into a structure much like we have
right now.  The unnamed stars will go into a StarBlock and get 
initialized with the memcpy() and init() tricks.

This will require two different draw() loops for the stars.  For now
both these loops will reside inside of StarComponent.  There will be a 
loop just like what we have now for drawing the named stars and there 
will be a 2nd, very similar loop that loops through the StarBlockLists
for drawing all of the unnamed stars.  Since we are only dealing with
global stars for now, each StarBlock list will have at most one 
StarBlock.  So the draw() routine in StarComponent would call 
drawNamed() and then call drawUnnamed().

I suppose you could use polymorphism and some abstraction to cram them
both together into one loop but I don't think we would really gain us 
very much.  We want to move decision making out of the inner loops 
whenever we can.

There might be a better way to do all of this but this is the best I 
could come up with for now.  If you disagree, or don't understand or
have an alternative idea, then let's discuss it further.  Maybe there
is a better way.


This is great work you've done so far.  I'm impressed that you've gotten 
stars to load so quickly so soon in the project.  We should really be 
thinking about going up to 10 million stars before the end of the 
summer.  I think our current (planned) data structures could handle 
that.   Does anyone know if there are 10 million stars that would be 
available for us to use?   I don't think our code would have to change 
at all.  I think our code (and file format) can be designed so someone
can just plop in the larger data file and it will all just work.

Things would probably have to change if we wanted to go to 100 million 
stars, but I think we can get to 10 million this summer if the data is
available.


-- 
Peace, James


More information about the Kstars-devel mailing list