[Kstars-devel] Testing the use of quaternions in KStars

Mon Oct 2 08:10:20 CEST 2006

On Monday, 2. October 2006 06:31, James Bowlin wrote:
> On Sunday 01 October 2006 21:02, Jason Harris wrote:

Yes, I have fallen for a similar trap during development ;-)

> > toScreen():  22 ms
> > toScreenQuaternion():  31 ms
> > So either our spherical-trig method is almost 30% faster than the
> > quaternion method, or I've done something wrong.

I don't know about the "special-trig" method, so no idea how it compares to 
that one. However using a Quaternion to  calculate the rotation-(3x3)-matrix 
from has several other advantages. And that's what I do in Marble (look into 
the file vectormap.cpp, e.g. line55 ). 
What actually would be faster using Quaternions is if we would have to deal 
with subsequent rotations, as multiplying two rotation quaternions involves 
less operations than multiplying two rotation matrices.
However that problem is not what we deal with here: in our case we rotate 
vectors using a rotation matrix / quaternion. In that case matrices are 
faster as they involve less operations and unless you use batch operations 
they are even easier to convert to SSE code as far as I learned.

> You may recall that I implemented your suggestion of using a rotation
> matrix instead of the trig functions in the 3.5 branch.  I got a slight
> speed improvement but much less than a factor of two.  I don't think
> that the quaternion rotation would be any faster than straight 3-d matrix
> multiplication (please correct me if you have evidence to the contrary).

right. However using quaternions for the rotation representation has some 
further advantages like that it's easier to calculate the track that is 
needed if you want to display an animation which shows how the focus moves on 
a straight line from one star to another. 

> I'd be interested in finding out exactly how many multiplications are
> used in the quaternion rotation.

In this particular case it's more operations. I also expected in the beginning 
that it would be less operations but that's not the case for the "virtual 
globe problem". That was one of the reasons why I started to mess with those 
quaternions initially as well. However they do have advantages and that's why 
I kept them in my code.

> Someone on the list made the suggestion that we write the 3-d
> rotation code in a way that would allow the compiler to use the SSE2
> instructions.  I've got no idea whether gcc is smart enough to do this
> for us or if we would have to hand-code the SSE2 code in assembly (as
> I've seen done elsewhere).  

Judging from my experimenting I'd say that gcc does a pretty good job at 
optimization if you compile with -msse. I originally used to rotate objects 
using quaternions as well and created some (beware: I'm a beginner at that)  
inline assembly code (as you can see in the commented out sections of 
quaternion.cpp). However that didn't result in any noticable speed 
improvement (Hey, I was happy that it wasn't actually noticably slower 
judging from other people's experience with creating such inline assembly 
code). Usually it's rather recommended to use the xmmintrinsics 
("xmmintrin.h") to create SSE code that is cross plattform anyways.

The lack of speed improvement for quaternion multiplication with "vectors" is 
probably due to the fact that rotation of "vectors" via Quaternions isn't 
exactly something that can be done very efficiently for one single vector due 
to the different signs of each vector component.

If you batch multiply many vectors with the very same rotation matrix that 
could result in quite some performance increase. However I don't know whether 
the xmmintrinsics that I mentioned earlier already take that into account or 
whether they can be used to do that.

> I'm pretty sure we would get a big speedup 
> if we were able to use SSE2 efficiently.  If nothing else, it would
> give us more registers to work with.
>
> The best introduction I know of to SIMD (single instruction, multiple
> data) is in The Art of Assembly Language Programming.  The good news
> is that it is available free on-line here:
>
> http://webster.cs.ucr.edu/AoA/Linux/HTML/TheMMXInstructionSet.html

For the "rotate vector around Matrix" this URL might be helpful:

http://www.cortstratton.org/articles/OptimizingForSSE.php

However it deals with inline assembly (not crossplattform) and does so in 
Intel notation.

Best regards,

Torsten