Autovectorization, SIMD, etc.

Cyrille Berger cberger at cberger.net
Sat Dec 23 18:13:13 CET 2006


Hi,

Thanks for the link, but I allready had a look at it and discard it, for two 
reasons :
 - C API, it might look childish, but when you are used to well designed C++ 
API, most C API looks scary and you don't want to touch it
 - the first time I had a look at it, I didn't understand how to use it...
 - and most important, you call each functions giving a float*/char*/whatever* 
array, then the function load the pointer in the mmx/sse register, do the 
operation, and unload the register to the result array. It might be fine for 
a small operation, but if you want to do the sum of pixels and store the 
result in destination, you are doing too many unloading (and I don't want to 
trust gcc to optimize stuff and remove the unneccesserary loading/unloading)

Basicaly my requirements were to write code like this:

void sumColors(char* resarr, char* srcarr, int nPixels)
{
Vector result;
Packet source(srcarr);
for(int i = 0; i < nPixels; i++)
{
	result += source;
	source.nextPixel();
}
result /= nPixels;
result.copyTo(resarr);
}

And then leave the compiler to build for me two sumColors functions, one with 
SIMD and one without. I am close to be able to do this.

macstl (http://www.pixelglow.com/macstl/, it also work for non-mac 
hardware ;) )  was very close to allow me to do this, but it had a few 
problems as well, first, it wasn't really possible to select at runtime which 
instruction set to choose, second point the code bellow would have looked a 
little more clutered (plus an other reason I have forgotten).

(I even want the class Packet to be able to work on two pixels (or more) at a 
time if their is enought room in the vector, and I want to write sumColors 
only once).


-- 
--- Cyrille Berger ---


More information about the kimageshop mailing list