Some mmx/sse2 benchmarking

Fri Dec 15 01:06:36 CET 2006

I would suggest changing the memcpy functions for testSSE to the load 
functions in xmmintrin, so

v1m = _mm_loadu_ps(v1);
v2m = _mm_loadu_ps(v2);

if you make sure the vectors are 16 byte aligned you can do it an even better 
way, just use _mm_load_ps instead.

Let me know if that imroves the sse timings!

-Tom

On Thursday 14 December 2006 16:42, Cyrille Berger wrote:
> Hi there,
>
> I have made some benchmarking about mmx and sse2. It's valgrind result,
> so it's cache miss. I don't have sysprof on that computer to do real
> measurement of cpu time. I have attach the source code so that you can
> check that I didn't made a mistake :)