[Kstars-devel] Some very preliminary results with OpenCL

Tue Jul 16 13:17:37 UTC 2013

What is your setup, what CPU and GPU are you using? I've seen benchmarks
claiming 30x improvement (and I got similar results myself), but in
those cases, the CPU code was single-threaded. Once I multithreaded it
(with OpenMP -- quite simple to implement), the improvement dropped to
(still impressive) 5-6x. 

Linearization of the problem is not necessary, as the GPU can handle
trig functions. 

My setup was i7 at 2.8GHz (4 core hyperthreaded),  8G RAM, GPU: NVidia
GTX-320. I recently upgraded it to a 650Ti, and preliminary tests (using
CUDA) show another 6x improvement compared to my old card.

Daniel

On Tue, 2013-07-16 at 00:23 -0700, Henry de Valence wrote:
> Hi all,
> 
> Yesterday I just got to the point where I could make some hacky
> comparison between new code and old code. I made an array of a million
> points and computed the precession, nutation, and
> equatorial-to-horizontal change using on the one hand the old
> implementation from SkyPoint and on the other an OpenCL kernel.
> 
> The result was 4282ms vs 39ms, which is a fairly dramatic improvement.
> However it's very preliminary for the following reasons:
> 
> 1. The OpenCL code is not optimised at all. Also none of the other
> code is optimised. So this is a very misleading comparison from the
> start.
> 
> 2. It does not compute aberration, and the aberration code is likely
> to be relatively slower because the problem is not as easy to
> linearize. There is a very elegant approach I read about that uses
> Möbius transformations to carry out the computation without using trig
> functions, but I need to work out some details before I can implement
> it. Once we add abberation and also refraction the speed gain should
> drop somewhat.
> 
> 3. The test is very synthetic since processing a million points is
> basically the best case in terms of overhead, and in practice our
> gains will depend on details about how much we can win by
> restructuring the expensive skycomponents so they fit this
> computational model (which is the second half of my project
> basically). In practice there will be more overhead than this example.
> 
> So for these reasons one should not place much faith in this results,
> but I thought I would share them anyways since they're quite hopeful.
> It seems that there is a good chance that we will be able to do better
> on the performance side. 
> 
> Henry
> 
> P.S.
> I wrote a blog post on some of my progress so far:
> 
> http://www.hdevalence.ca/blog/2013-07-09-kstars-gsoc-progress-update
> 
> but I remember now that I never sent it to the list. I am planning to
> write another once I finish up the parts that I'm working on now.
> 
> _______________________________________________
> Kstars-devel mailing list
> Kstars-devel at kde.org
> https://mail.kde.org/mailman/listinfo/kstars-devel