[Kstars-devel] RFC: KStars GSOC: data pipelining and OpenCL.
Henry de Valence
hdevalence at gmail.com
Sat Apr 13 20:21:03 UTC 2013
Hi all,
I'd like to propose some architectural changes to the KStars data
processing pipeline. Generally speaking, this would involve rewriting
the portions of the code between the data storage (in catalogs) and the
painting interface so that the functions involved are OpenCL kernels
that can be executed in parallel on the CPU or with massive parallelism
(hundreds of "cores") on the GPU.
The steps that KStars must perform on the sky objects before they can be
displayed on the screen are roughly as follows:
1. Precession/Nutation;
2. Equatorial -> Horizontal conversion;
3. Projection to screen coordinates.
Currently, both 1 and 2 are done on the CPU with code that is neither
thread- nor SIMD-parallel. 3 is also done on the CPU, even in the case
of the GL backend I implemented three years ago, due to the decision to
use the legacy direct-mode instead of using OpenGL vertex shaders (as I
recall, this was motivated by compatibility concerns, but in retrospect
it looks like a rather poor choice).
The advantages of moving these tasks into CL kernels are these:
1. Most importantly, we gain the ability to execute code on the GPU.
General-purpose GPU computing is already here, and it's going to be
even more important in the future than it is now: today, a low-end
$50 AMD CPU has a graphics processor on-chip with 128 processing
elements, while a higher-end graphics card may have over 1024. The
benefit grows even more when we talk about low-power embedded
devices, since they usually have weak processors, but capable GPUs
[1]. Using all the available hardware gives really dramatic
improvements [2], and I think that the workload for KStars would be
well-suited for it.
In the event where the user has hardware that only supports execution on
the
GPU, we still gain:
2. It's very, very, rare to see a single-core machine, but KStars uses
only a single thread for all of the processing. OpenCL automatically
runs code in parallel across all available cores. The three steps
above are obviously parallel between stars, and should be run in two,
four, ... threads as appropriate, with OpenCL doing the work of
determining workgroup sizes.
3. KStars currently has rather poor memory-access patterns due to
putting an OO code structure on a problem that is really more of a
functional, data- processing problem. Using CL forces us to
structure the code so that instead of calling functions many times
on different bits of data at different locations in memory, we
essentially call functions few times on very large contiguous arrays
of memory containing all of the data, resulting in better
performance. (See, for example, Drepper's matrix multiplication
example [3], where doing the extra work of malloc'ing 8MB of memory
and filling it with a matrix transpose gives a nearly
two-order-of-magnitude speed increase.)
Note that #3 is something I'd like to test and get hard numbers on
before writing any applications, and I have a student version of
VTune that is supposedly able to profile these things, but it
doesn't want to work properly.
Finally, we also have this:
4. The main bottleneck of the OpenGL mode is sending stars to the
graphics card. The next is projecting the stars. Here, the stars are
already on the graphics card, and are sent there less frequently and
from a position where we know how many we'll be sending (e.g. if we
load a trixel at a time), so the first problem goes away, and the
second also goes away because we use a vertex shader to do
projection.
This email is somewhat light on technical details from the KStars point
of view; I'll be writing another one soon with details about how I'd
like to do this, but I want to be very, very, very careful with it,
specifically in the area of how to prevent scope creep.
The reason is that the last time I did a summer of code project I didn't
do a very good job of planning on how to avoid scope creep, and as a
result, I got really bogged down in trying to fix everything in KStars
and burned myself out. So, I think that it's really important to make
sure that the plan of what I'll be doing in a project proposal has not
just really clear scheduling, but also some stuff about what not to
do, and how to make sure that we make minimal changes elsewhere.
I'm hoping to have that ready by midweek; however, I'd appreciate any
comments that people have on the generalities in the meantime.
Cheers,
Henry de Valence
[1]: See also the list of companies here: http://hsafoundation.com/ to get
an
idea of where OpenCL is headed in the embedded realm.
[2]: Compare GIMP performance with and without OpenCL here:
http://www.tomshardware.com/reviews/photoshop-cs6-gimp-aftershot-
pro,3208-10.html
[3]: http://www.akkadia.org/drepper/cpumemory.pdf page 50
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kstars-devel/attachments/20130413/fb03f170/attachment.html>
More information about the Kstars-devel
mailing list