Some mmx/sse2 benchmarking
Tom Burdick
tburdi1 at uic.edu
Fri Dec 15 01:06:36 CET 2006
I would suggest changing the memcpy functions for testSSE to the load
functions in xmmintrin, so
v1m = _mm_loadu_ps(v1);
v2m = _mm_loadu_ps(v2);
if you make sure the vectors are 16 byte aligned you can do it an even better
way, just use _mm_load_ps instead.
Let me know if that imroves the sse timings!
-Tom
On Thursday 14 December 2006 16:42, Cyrille Berger wrote:
> Hi there,
>
> I have made some benchmarking about mmx and sse2. It's valgrind result,
> so it's cache miss. I don't have sysprof on that computer to do real
> measurement of cpu time. I have attach the source code so that you can
> check that I didn't made a mistake :)
More information about the kimageshop
mailing list