[Kstars-devel] Testing the use of quaternions in KStars

Mon Oct 2 19:16:46 CEST 2006

Sorry, for my somewhat weird reply in the morning. 

I only scanned through James' mail shortly before I had to leave my flat in a 
hurry. So if you read my mail and if you didn't really understand everything 
I said: Not it was not your fault. It was me assuming things from James' 
about Jason's implementation that weren't in there.

So here is some more thoughts. This time hopefully much more clearly :-)

Before I cover the quaternion issue I'd like to say that the performance of 
Marble is not only due to using the combination of matrix multiplication and 
Quaternions: 

For the texturing it's mostly because of the organisation of the bitmaps in 
tiles and due to smart linear interpolation.

For the vectors it's mostly due to:
- calculating the coordinates of the polygon boundary rectangle beforehand and 
not processing everything that is in boundaries that don't "touch" the 
screencoordinates. (I don't know whether KStars organizes star coordinates in 
memory in rectangular "tiles" already  - if you don't, that could be an 
approach to skip data that doesn't get displayed anyways). 
- While zooming into the virtual globe you get from an interval of 0<= z <= 1 
being covered to some smaller interval 1 - epsilon <= z <=  1. It should be 
easy to calculate the smallest epsilon that still covers the whole screen. 
So by calculating the Z-value of the objects first you can decide whether it's 
worth to further progress the data or to skip it.
- clipping of polygons: I have my own clippainter class: It makes sure that 
nodes outside the screen don't get painted. Unfortunately the class is still 
a bit buggy for higher zoom levels.

For bumpmapping (yes, bumpmapping for the topographic map happens on the fly 
for each frame as well ... and yes, I know it's not of interest for kstars):

- I do the most simple bumpmapping that I was able to come up with by 
comparing the values of pixels that are horizontally 3 pixels away from each 
other. I compensate for perspective distortion of the bumpmapping by doing 
some really problem specific cheap approximations. I do that in the class 
that colorizes the grayscale textured sphere on the fly and combines the 
information of the grayscale map with the vector data.

> toScreen():  22 ms
> toScreenQuaternion():  9 ms

Now that looks much more in accordance with my tests that compared my earlier 
cosine / sine implementation with the quaternion/matrix one.

I'd even bet that on devices that use Qtopia the difference would be even 
larger as they usually don't play well with sines and cosines (and floating 
point calculations as far as I heard). Since I'd like Marble to work on those 
Greenphones and PDA's as well, that's a good reason for me to choose matrix 
multiplications over "trigonometric" calculations. 

Jason, did you do those measurements with -msse ?

CFLAGS        = -pipe -O2 -msse -O2 -Wall -W -D_REENTRANT  $(DEFINES)
CXXFLAGS      = -pipe -O2 -msse -O2 -Wall -W -D_REENTRANT  $(DEFINES)

That might boost the advantage even more as I guess that this already 
optimizes the code in a way that the matrix multiplications get executed 
concurrently for all components at the same time.

I'm not sure whether gcc does it as good as possible. Someone would have to 
look at 

http://www.cortstratton.org/articles/OptimizingForSSE.php

I'd especially be interested whether the "Batch Processing" suggested there 
could be used with a significant advantage together with the crossplattform 
xmmintrin.h instead of real inline assembly code (but then again maybe the 
gcc is really smart already and does that already for us).

> This evening, I added experimental support for quaternions in KStars.  It
> was surprisingly easy to do.  [...] SkyPoint [...] (marble has a GeoPoint 
> [...] SkyMap (following marble's KAtlasGlobe),

Now that are nice similarities :-) 

> QPointF SkyMap::toScreenQuaternion( SkyPoint *o, double scale ) {
> 	QPointF p;
> 	Quaternion oq = o->quat(); 
> 	oq.rotateAroundAxis( m_rotAxis );
>
> 	p.setX( 0.5*width()  - scale*oq.v[Q_X] );
> 	p.setY( 0.5*height() - scale*oq.v[Q_Y] );
>
> 	return p;
> }

Yes, that looks familiar :-))) However as mentioned before it _might_ make 
more sense to check whether v[Q_Z] is within an intervall that would get 
displayed on the screen before you do possibly useless calculations of p.x 
and p.y. Up to you to find out ...

I agree with your replies to Luciano. And concerning Luciano's suggestions 
about parallelization you might want to look at my quaternion class. At some 
point of development I used the Quaternion representation of m_rotAxis to 
rotate the vector: So at that point of time I used:

void Quaternion::rotateAroundAxis(const Quaternion &q) {

instead of:

void Quaternion::rotateAroundAxis(const matrix &m)

The latter has less operations and is easier to parallelize due to differing 
signs of the components in the former.

As you can see there is another method which tried exactly to accomplish what 
Luciano suggested:

void QuaternionSSE::rotateAroundAxis(const Quaternion &q)

(and it even worked except for that it was not faster -- maybe due to the 
inherent sign issue mentioned already)

Now somebody would have to create a similar method

void QuaternionSSE::rotateAroundAxis(const matrix &m)

based on the "OptimizingForSSE" document above and maybe using XMM Intrinsics 
instead of inline assembly to keep it cross plattform. Everything needed 
should be well prepared already ;-)

BTW: You'll find me (tackat) on IRC #kde-edu quite often ...

Torsten