[Kstars-devel] Testing the use of quaternions in KStars

Luciano Montanaro mikelima at cirulla.net
Mon Oct 2 14:36:01 CEST 2006


On Monday 02 October 2006 06:31, James Bowlin wrote:

>
> Someone on the list made the suggestion that we write the 3-d
> rotation code in a way that would allow the compiler to use the SSE2
> instructions.  I've got no idea whether gcc is smart enough to do this
> for us or if we would have to hand-code the SSE2 code in assembly (as
> I've seen done elsewhere).  I'm pretty sure we would get a big speedup
> if we were able to use SSE2 efficiently.  If nothing else, it would
> give us more registers to work with.

Gcc 4.1 has some support for automatic detection of simple parallelizable 
loops, so, if the correct flags are given and the loop is suitable, the 
compiler can use vector instructions. On x86-64, for example, I think you 
can assume a vector unit is present, so loop can be parallelized.

In practice, only very simple loops can be vectorized with currently 
released Gcc, and you'll likely need to lay them out very carefully.
 
I'm not sure everybody is aware of this, so I'll point you to the relevant 
page of gcc doc (copy the line below in konqueror URL bar):

info:/gcc/Vector Extensions

A fallback should be provided for architectures or machines missing the 
extension (or for different compilers missing the extensions).

Anyway, you can declare
 
typedef int v4si __attribute__ ((vector_size (16)));
v4si a, b, c;
c = a + b;

Check out also

info:/gcc/X86 Built-in Functions

for the list of built-in functions (MMX and SSE).

Other things to check: 

--fast-math can really speed up floating point calculations. 
Strict C math conformance is really expensive, and may be responsible for 
large performance loss.

Basic operations are more likely to be vectorizable than trigonometric 
functions

Data layout may make a huge difference.

If the star catalogue is going to be partitioned in sectors, it would likely 
be much faster to have separate arrays like

double x[];
double y[];
double z[];
double w[];
QString names[];

rather than structures like
class star {
	double x;
	double y;
	double z;
	double w;
	QString name;
};

because you will not need unnecessary loads in the cache and because vector 
instructions are made to, well, operate on vectors.

Luciano


-- 
./.. ../ /./. .. ./ /. ///   // /// /. / ./ /. ./ ./. /// ././. //
                                                            \\ //
                                             www.cirulla.net \x/


More information about the Kstars-devel mailing list