[Kstars-devel] Testing the use of quaternions in KStars
Luciano Montanaro
mikelima at cirulla.net
Mon Oct 2 14:36:01 CEST 2006
On Monday 02 October 2006 06:31, James Bowlin wrote:
>
> Someone on the list made the suggestion that we write the 3-d
> rotation code in a way that would allow the compiler to use the SSE2
> instructions. I've got no idea whether gcc is smart enough to do this
> for us or if we would have to hand-code the SSE2 code in assembly (as
> I've seen done elsewhere). I'm pretty sure we would get a big speedup
> if we were able to use SSE2 efficiently. If nothing else, it would
> give us more registers to work with.
Gcc 4.1 has some support for automatic detection of simple parallelizable
loops, so, if the correct flags are given and the loop is suitable, the
compiler can use vector instructions. On x86-64, for example, I think you
can assume a vector unit is present, so loop can be parallelized.
In practice, only very simple loops can be vectorized with currently
released Gcc, and you'll likely need to lay them out very carefully.
I'm not sure everybody is aware of this, so I'll point you to the relevant
page of gcc doc (copy the line below in konqueror URL bar):
info:/gcc/Vector Extensions
A fallback should be provided for architectures or machines missing the
extension (or for different compilers missing the extensions).
Anyway, you can declare
typedef int v4si __attribute__ ((vector_size (16)));
v4si a, b, c;
c = a + b;
Check out also
info:/gcc/X86 Built-in Functions
for the list of built-in functions (MMX and SSE).
Other things to check:
--fast-math can really speed up floating point calculations.
Strict C math conformance is really expensive, and may be responsible for
large performance loss.
Basic operations are more likely to be vectorizable than trigonometric
functions
Data layout may make a huge difference.
If the star catalogue is going to be partitioned in sectors, it would likely
be much faster to have separate arrays like
double x[];
double y[];
double z[];
double w[];
QString names[];
rather than structures like
class star {
double x;
double y;
double z;
double w;
QString name;
};
because you will not need unnecessary loads in the cache and because vector
instructions are made to, well, operate on vectors.
Luciano
--
./.. ../ /./. .. ./ /. /// // /// /. / ./ /. ./ ./. /// ././. //
\\ //
www.cirulla.net \x/
More information about the Kstars-devel
mailing list