[Kstars-devel] RFC: KStars GSOC: data pipelining and OpenCL.

Henry de Valence hdevalence at gmail.com
Fri Apr 19 20:05:45 UTC 2013


Hi all, sorry for my absence; it's near the end of term and I've been quite
busy.

One thing I'd like to point out is that OpenCL isn't really about graphics
processors, it's a way to structure embarassingly parallel problems like
the ones in KStars. You can run OpenCL on a CPU, a GPU, an APU, an FPGA,
some weird DSP thing, .... whatever.

Even for people who use no GPU at all, the OpenCL code lets you run across
multiple cores with no extra effort. Moving to OpenCL means moving away
from the inefficent OO data-processing approach we use now, towards a more
functional, parallellizable approach, so the data representation has to
change to match, and we should obviously change it to work with
quaternions.

I don't see the point of rewriting the KStars processing code completely
just so that we get to where everyone else is. We should rewrite it
properly, so that it works better now on CPU hardware, and beats everyone
else for the common case where the computer has a CL-enabled GPU. I think
that in the case where we have the most possible parallelism (displaying
lots and lots of stars) and we have a GPU, we should aim for 100x speedup,
not 10x.

My rough plan is to change the internal structure of the SkyPoint class to
use quaternions internally, but keeping the existing API as wrappers (Of
course, this initially slows everything down, since now you have to do trig
to access, not just calculate with, the coordinates). Then, move most of
the calculation functions for the SkyPoint out of the SkyPoint class and
rewrite them as to operate on buffers of quaternion vectors, and finally
move through all of the sky components and rewrite them to use the new
calculation functions instead of the old, slow ones, processing all of the
objects for the particular component in a single pass, rather than doing
one calculation per object.

Ideally you would remove all references to ra/dec/eq/hor coordinates for
anything, but I think that changing the top 95% of the calls (by time)
would work well enough, especially since we will get a speed boost from
using multiple cores.

Cheers, Henry









On Sat, Apr 13, 2013 at 6:40 PM, Aleksey Khudyakov <
alexey.skladnoy at gmail.com> wrote:

> On 14 April 2013 02:04, Akarsh Simha <akarshsimha at gmail.com> wrote:
> >> AFAIR conversions of coordinates is not worst bottleneck. Last time I
> checked (1
> >> or 2 years ago) drawing of constellation lines and borders and
> coordinates grid
> >> very much to my surprise. Any proposals to improve performance must be
> backed up
> >> with profiling/benchmarks. Otherwise it's too easy to fall into trap of
> >> optimizing wrong thing.
> >
> > Even with the USNO NOMAD catalog? That is a bit hard to believe,
> > although it might be the case. With the USNO NOMAD catalog, KStars
> > crawls when zoomed in on Sagittarius.
> >
> Without. That's valid point. Also how frequently do we need to update
> horizontal
> coordinates? For every star in memory on each time step? If so it's
> huge time sink
> too.
>
> Another advantage of quaternion approach is immutability. We do not
> need to modify
> coordinates of star except possibly to account for proper motion. Code
> shall become
> simpler too
>
>
> >> Furthermore we can get ~10x performance boost (uneducated guess) by
> changing
> >> representation of sky point. Currently it's represented by two angles
> and
> >> conversions between different coordinate systems are quite costly: 5 or
> 6 calls
> >> to trigonometry functions.
> >>
> >> Much more convenient scheme is to store points as 3D vectors with unit
> norm and
> >> some flag to distinguish between coordinate systems. In this case
> >> transformations between different coordinate systems could be done using
> >> quaternions and are cheap (15 multiplications). So there are no reason
> to cache
> >> horizontal coordinates, they could be recalculated on the fly if
> desired.  Most
> >> of the projections also become cheaper since they don't involve
> trigonometry in
> >> this representation.
> >>
> >>
> >> This has been discussed on mail list before. You can search using
> "quaternion"
> >> keyword
> >
> > Yeah, quaternions are certainly a good idea. Not sure Henry can fit it
> > into his time-line?
> >
> In my opinition it absolutely must be fitted there. If there isn't enough
> time
> drop OpenCL part. Reasons are simple:
>
>  1. We are going to change representation of stars/deep-skyes/whatever
>     anyway. Then we should change it to the most efficient one.
>
>  2. It's possible to render sky on CPU with LOT of start. Other people did
>     just that. So we should try to get good CPU performance first in order
>     to avoid penalizing people which couldn't use GPU for whatever
>     reason.
>
>  3. It's not clear that processing on GPU is clear win. Sure even low end
>     GPUs are order of magnitude faster. But... if workload maps on
> execution
>     scheme of GPU nicely if we won't saturate bus if any other unforeseen
>     problem won't surface.
>
>  4. We could probably hope for 10-20x speedup in ideal case. If we can
>     get similar speedup by using right algorithms we should do this. If
>     this isn't enough then we need to get big hammer (GPU in this case)
> _______________________________________________
> Kstars-devel mailing list
> Kstars-devel at kde.org
> https://mail.kde.org/mailman/listinfo/kstars-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kstars-devel/attachments/20130419/82652794/attachment.html>


More information about the Kstars-devel mailing list