Some mmx/sse2 benchmarking

Cyrille Berger cberger at cberger.net
Tue Dec 19 23:05:35 CET 2006


On Friday 15 December 2006 01:06, Tom Burdick wrote:
> I would suggest changing the memcpy functions for testSSE to the load
> functions in xmmintrin, so
>
> v1m = _mm_loadu_ps(v1);
> v2m = _mm_loadu_ps(v2);
>
> if you make sure the vectors are 16 byte aligned you can do it an even
> better way, just use _mm_load_ps instead.
>
> Let me know if that imroves the sse timings!
yes it's a lot better ! if you have any other tip, I will be glad to take 
them ! I also tried to profile with sysprof, and odly, memcpy and loadu 
functions are approximatively of the same speed (way faster than x86 
instructions).

I didn't progress on the library as fast as I would have wish, but half 
written version is available here: 
http://cyrille.diwi.org/tmp/krita/libfastpp.tar.bz2, including an updated 
version of costs.ods.

-- 
--- Cyrille Berger ---


More information about the kimageshop mailing list