[Kstars-devel] profiling KStars

Tue May 23 20:27:19 CEST 2006

Hello,

I've begun to work on trying to speed KStars up a bit.  I've been trying 
to use callgrind to profile KStars, but I'm having trouble getting it to 
work.  When I run kstars through callgrind, it always seems to crash 
when it finishes the startup sequence and draws the map (which is 
exactly the part I want to profile!).  If anyone is interested in trying 
to help profile KStars, please reply.

Even without the profiling results, I have some ideas on how we can save 
CPU cycles.  One expensive procedure is that every time the skymap is 
recomputed, we need to determine whether each object is on-screen, based 
on its position in the sky relative to the current Focus point.

The function we have for this (SkyMap::checkVisibility()) is fairly 
well-optimized, but it still needs to be run on every object in the 
database, so any savings in this function should have a big payoff.

One idea is to use what's called a "hierarchical triangular mesh" (HTM). 
  The concept is described in detail here:
http://www.sdss.jhu.edu/htm/
but here is a basic description:
We are dividing the celestial sphere into a large number of nested 
triangular sections.  The toplevel division uses the Zenith, Nadir 
(anti-zenith), and the 4 cardinal points on the equator as vertices to 
divide the sky into 8 equilateral spherical triangles.  Then, each of 
these can be divided into 4 sub-triangles by connecting the midpoints of 
each triangle's sides.  Then each of *those* can be divided into 4 
pieces the same way.  You can continue this to as many levels as you 
like, dividing the sphere into smaller and smaller pieces.  At a given 
level, any triangle can be identified by a simple integer, indicating 
its location at each level in the hierarchy.  For example, you could 
have a triangle "+203102311201" (the leading "+/-" indicates northern 
vs. southern hemisphere, then you have digits 0-3 identifying which 
sub-triangle at the current level).  I think I would probably use a 
binary two-bit representation for each level, instead of the digits 0123 
(i.e., 0=="00", 1=="01", 2=="10", 3=="11"), so the above triangle ID 
would be "+100011010010110101100001", or +9,252,193 in decimal.

Ok, so how does this help us?  I think it can provide a way to tell very 
quickly whether a point is onscreen or not.  For example, consider a 
case where the four corners of the current visible section are in the 
following triangles:

+100011010010110101101001
+100011010010110101111000
+100011010010110101111011
+100011010010110101110001

We must be zoomed in pretty far, because these ID strings only differ 
near the end.  In other words, they all share a common "parent" 
triangle, namely:
+100011010010110101

Any point on the sky which is not also in that parent triangle is 
instantly rejected as being offscreen.  So our checkVisibility() 
function can be reduced to a few bitwise comparisons of integers, which 
should be very fast indeed.  However, we'd still need to run 
checkVisibility() on the entire object catalog every time.  Maybe we 
could also store objects in a hierarchical list structure that mimics 
the HTM, which would allow us to skip entire sections of objects based 
on their position within the tree.

I don't have all the details of exactly how this should be implemented, 
but I just wanted to share things that I'm thinking about here.

Another idea I'm playing with is one that's come up before on this list: 
  modifying SkyPoint to store the X,Y,Z coordinates of each point, 
instead of (or in addition to) the RA and Dec coordinates.  This could 
improve the speed of calculating each object's position on the screen. 
Right now, we do spherical trigonometry on each SkyPoint and the SkyMap 
FocusPoint to determine the current screen cordinates of a given object, 
and these calculations are pretty expensive.

If we stored the coordinates of each object as X,Y,Z, then we only need 
to do a simple multiplication to determine each object's screen 
coordinates (Xs, Ys):

Xs = a1*X + b1*Y + c1*Z
Ys = a2*X + b2*Y + c2*Z

The important thing to realize here is that the numbers 
(a1,b1,c1,a2,b2,c2) would only need to be determined *once* for a given 
calculation of the sky.  Then, we can loop through all objects and 
repeat the same multiplication on their (X,Y,Z) coordinates to determine 
their screen (Xs,Ys) coordinates!  The a,b,c values basically represent 
the coefficients of a rotation matrix, plus some other stuff to account 
for the curent zoom level and the size of the SkyMap widget.  I've done 
some preliminary testing, and it looks like this method may be about 3 
to 4 times faster than our current SkyMap::getXY() function.

Well, that's probably a lot to digest.  Let me know what you think, and 
if you're willing to help implement this stuff.

regards,
Jason