[Kstars-devel] RFC: KStars GSOC: data pipelining and OpenCL.

Sun Apr 21 15:31:54 UTC 2013

Hey Harry

>    One thing I'd like to point out is that OpenCL isn't really about graphics
>    processors, it's a way to structure embarassingly parallel problems like
>    the ones in KStars. You can run OpenCL on a CPU, a GPU, an APU, an FPGA,
>    some weird DSP thing, .... whatever.
>    Even for people who use no GPU at all, the OpenCL code lets you run across
>    multiple cores with no extra effort.**Moving to OpenCL means moving away
>    from the inefficent OO data-processing approach we use now, towards a more
>    functional, parallellizable approach, so the data representation has to
>    change to match, and we should obviously change it to work with
>    quaternions.**
>    I don't see the point of rewriting the KStars processing code completely
>    just so that we get to where everyone else is. We should rewrite it
>    properly, so that it works better now on CPU hardware, and beats everyone
>    else for the common case where the computer has a CL-enabled GPU. I think
>    that in the case where we have the most possible parallelism (displaying
>    lots and lots of stars) and we have a GPU, we should aim for 100x speedup,
>    not 10x.**
>    My rough plan is to change the internal structure of the SkyPoint class to
>    use quaternions internally, but keeping the existing API as wrappers (Of
>    course, this initially slows everything down, since now you have to do
>    trig to access, not just calculate with, the coordinates). Then, move most
>    of the calculation functions for the SkyPoint out of the SkyPoint class
>    and rewrite them as to operate on buffers of quaternion vectors, and
>    finally move through all of the sky components and rewrite them to use the
>    new calculation functions instead of the old, slow ones, processing all of
>    the objects for the particular component in a single pass, rather than
>    doing one calculation per object.
>    Ideally you would remove all references to ra/dec/eq/hor coordinates for
>    anything, but I think that changing the top 95% of the calls (by time)
>    would work well enough, especially since we will get a speed boost from
>    using multiple cores.

This looks like a very good plan to me. I'm looking for a few more
details, as I've discussed with you on IRC.

Any comments from others?

Regards
Akarsh