[Kstars-devel] Some very preliminary results with OpenCL

Tue Jul 16 14:57:19 UTC 2013

Sorry, I forgot to "Reply All" and didn't send to the list.
---------- Forwarded message ----------
From: "Henry de Valence" <hdevalence at hdevalence.ca>
Date: 16 Jul 2013 10:54
Subject: Re: [Kstars-devel] Some very preliminary results with OpenCL
To: "Daniel Baboiu" <daniel.baboiu at shaw.ca>
Cc:

On 16 Jul 2013 09:17, "Daniel Baboiu" <daniel.baboiu at shaw.ca> wrote:
>
> What is your setup, what CPU and GPU are you using? I've seen benchmarks
> claiming 30x improvement (and I got similar results myself), but in
> those cases, the CPU code was single-threaded. Once I multithreaded it
> (with OpenMP -- quite simple to implement), the improvement dropped to
> (still impressive) 5-6x.

I'm using an AMD Radeon 7850 and an Intel i5-3350P. In this case the CPU
code is single threaded, since KStars does no threading whatsoever.

I expect that we won't see such a dramatic differential between the CPU and
the GPU when all is said and done, and we are comparing the same algorithms
on different hardware (the above is a comparison of a faster algorithm on
faster hardware against a slower algorithm on slower hardware).

But the baseline isn't KStars with OpenMP and well-designed code, it's
KStars as it is in the master branch: some mess of linked-lists of arrays
of objects with virtual methods that do the computation by modifying some
internal state of those objects in complicated and unpredictable ways.

Most of the effort, indeed, is actually orthogonal to how the computation
is actually carried out (OpenMP/OpenCL/single threaded CPU code/etc) and
relates to slightly bigger issues: the algorithms we use, how the data is
stored, etc.

To get real numbers on how the CPU compares to the GPU we have to wait
until I've finished more of the work: this is just an encouraging first
proof-of-concept (a GPU is better at lots of little matrix operations,
quelle surprise).

> Linearization of the problem is not necessary, as the GPU can handle
> trig functions.

Yes, it is not necessary, but I think that it is desirable. All of the
other transformations work on vectors, so to do it using the existing
algorithm requires a whole set of trig functions to obtain the spherical
coordinates, then another set of trig calls to do the transformation, then
another set of trig calls to get a vector again. It seems less than optimal.

Also if we are concerned enough about accuracy to be using doubles at such
great expense (16x slower on my card, for instance, and I think nVidia
cripples their consumer cards as badly), I don't know that we should be
using native_sin() and friends instead of the slower trig functions that
have precisions set by the standard.

The alternative is stereographic projection + scaling + deprojection, which
just needs standard operations plus possibly a square root. It seems
preferable, but I may implement the existing algorithm in the meantime just
to be able to run a complete pipeline.

Cheers,
Henry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kstars-devel/attachments/20130716/d405b745/attachment.html>