[Kstars-devel] Testing the use of quaternions in KStars

Mon Oct 2 06:31:43 CEST 2006

On Sunday 01 October 2006 21:02, Jason Harris wrote:
> To test it, I added some timing code to time how long it takes to process 
> 126,000 stars through toScreen(), and through toScreenQuaternion().  The 
> results:
> 
> toScreen():  22 ms
> toScreenQuaternion():  31 ms
> 
> So either our spherical-trig method is almost 30% faster than the quaternion 
> method, or I've done something wrong.

I'm not surprised by this result.  I am familiar with quaternions from
my previous work in physics.  They are very beautiful.  For example,
they can be used to cast Maxwell's equations into an extremely simple
form. 

You may recall that I implemented your suggestion of using a rotation
matrix instead of the trig functions in the 3.5 branch.  I got a slight
speed improvement but much less than a factor of two.  I don't think
that the quaternion rotation would be any faster than straight 3-d matrix
multiplication (please correct me if you have evidence to the contrary).

I don't think you did anything wrong.  For example if the quaternion
rotation involved a 4x4 matrix instead of the 3x3 matrix of the 3-d
rotation then the quaternion time should be roughly 16/9ths of the
3-d rotation time.   There is also the possibility that calling yet
another class inside the critical loop caused more cache misses
inside the loop which could also cause a slowdown.

I think it's great that you are experimenting to find ways to make
KStars faster.  I'd be glad to send you the 3-d rotation code I used
if you want to compare it directly to the current trig way and the
quaternion way.  I think it was maybe 20% faster than the trig way
which would make it clock in at 18 ms on your setup.  If we do a
rough calculation:

   18 ms * 16/9 = 32 ms

which seems to check out but this is all pretty rough so I wouldn't
take this too seriously.  The more experiments we do like this, the
better we will understand the speed constraints of the critical loop.
I'd be interested in finding out exactly how many multiplications are
used in the quaternion rotation.  

I'm still surprised that the trig functions did not give us a significant
speed hit.  I did some independent tests to compare the speed of multiplying
with the speed of taking a Sine and although the Sine took more time it took
about 20% more time, certainly not a factor of two more.

Someone on the list made the suggestion that we write the 3-d
rotation code in a way that would allow the compiler to use the SSE2
instructions.  I've got no idea whether gcc is smart enough to do this
for us or if we would have to hand-code the SSE2 code in assembly (as
I've seen done elsewhere).  I'm pretty sure we would get a big speedup
if we were able to use SSE2 efficiently.  If nothing else, it would
give us more registers to work with.

The best introduction I know of to SIMD (single instruction, multiple
data) is in The Art of Assembly Language Programming.  The good news
is that it is available free on-line here:

http://webster.cs.ucr.edu/AoA/Linux/HTML/TheMMXInstructionSet.html

The bad news is that it only covers MMX, not SSE or SSE2 which is what
I think we should be using.  But once you have the concepts, it should
be pretty easy to apply them to the more advanced SIMD instruction sets.
Basically, all the stuff he complains about in the book was fixed with
SSE2.  Take a look at the Wikipedia SSE2 article for more details.

-- 
Peace, James