Some mmx/sse2 benchmarking
Cyrille Berger
cberger at cberger.net
Tue Dec 19 23:05:35 CET 2006
On Friday 15 December 2006 01:06, Tom Burdick wrote:
> I would suggest changing the memcpy functions for testSSE to the load
> functions in xmmintrin, so
>
> v1m = _mm_loadu_ps(v1);
> v2m = _mm_loadu_ps(v2);
>
> if you make sure the vectors are 16 byte aligned you can do it an even
> better way, just use _mm_load_ps instead.
>
> Let me know if that imroves the sse timings!
yes it's a lot better ! if you have any other tip, I will be glad to take
them ! I also tried to profile with sysprof, and odly, memcpy and loadu
functions are approximatively of the same speed (way faster than x86
instructions).
I didn't progress on the library as fast as I would have wish, but half
written version is available here:
http://cyrille.diwi.org/tmp/krita/libfastpp.tar.bz2, including an updated
version of costs.ods.
--
--- Cyrille Berger ---
More information about the kimageshop
mailing list