<div dir="ltr">Hi all, sorry for my absence; it's near the end of term and I've been quite busy.<div><br></div><div>One thing I'd like to point out is that <span class="" style>OpenCL</span> isn't really about graphics processors, it's a way to structure <span class="" style>embarassingly</span> parallel problems like the ones in <span class="" style>KStars</span>. You can run <span class="" style>OpenCL</span> on a CPU, a <span class="" style>GPU</span>, an <span class="" style>APU</span>, an <span class="" style>FPGA</span>, some weird <span class="" style>DSP</span> thing, .... whatever.</div>
<div><br></div><div>Even for people who use no <span class="" style>GPU</span> at all, the <span class="" style>OpenCL</span> code lets you run across multiple cores with no extra effort. Moving to <span class="" style>OpenCL</span> means moving away from the <span class="" style>inefficent</span> <span class="" style>OO</span> data-processing approach we use now, towards a more functional, <span class="" style>parallellizable</span> approach, so the data representation has to change to match, and we should obviously change it to work with quaternions. </div>
<div><br></div><div>I don't see the point of rewriting the <span class="" style>KStars</span> processing code completely just so that we get to where everyone else is. We should rewrite it properly, so that it works better now on CPU hardware, and beats everyone else for the common case where the computer has a CL-enabled <span class="" style>GPU</span>. I think that in the case where we have the most possible parallelism (displaying lots and lots of stars) and we have a <span class="" style>GPU</span>, we should aim for 100x speedup, not 10x. </div>
<div><br></div><div>My rough plan is to change the internal structure of the <span class="" style>SkyPoint</span> class to use quaternions internally, but keeping the existing API as wrappers (Of course, this initially slows everything down, since now you have to do trig to access, not just calculate with, the coordinates). Then, move most of the calculation functions for the <span class="" style>SkyPoint</span> out of the <span class="" style>SkyPoint</span> class and rewrite them as to operate on buffers of quaternion vectors, and finally move through all of the sky components and rewrite them to use the new calculation functions instead of the old, slow ones, processing all of the objects for the particular component in a single pass, rather than doing one calculation per object.</div>
<div><br></div><div>Ideally you would remove all references to <span class="" style>ra</span>/<span class="" style>dec</span>/<span class="" style>eq</span>/<span class="" style>hor</span> coordinates for anything, but I think that changing the top 95% of the calls (by time) would work well enough, especially since we will get a speed boost from using multiple cores.</div>
<div><br></div><div>Cheers, Henry</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Apr 13, 2013 at 6:40 PM, Aleksey Khudyakov <span dir="ltr"><<a href="mailto:alexey.skladnoy@gmail.com" target="_blank">alexey.skladnoy@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 14 April 2013 02:04, Akarsh Simha <<a href="mailto:akarshsimha@gmail.com">akarshsimha@gmail.com</a>> wrote:<br>
>> AFAIR conversions of coordinates is not worst bottleneck. Last time I checked (1<br>
>> or 2 years ago) drawing of constellation lines and borders and coordinates grid<br>
>> very much to my surprise. Any proposals to improve performance must be backed up<br>
>> with profiling/benchmarks. Otherwise it's too easy to fall into trap of<br>
>> optimizing wrong thing.<br>
><br>
> Even with the USNO NOMAD catalog? That is a bit hard to believe,<br>
> although it might be the case. With the USNO NOMAD catalog, KStars<br>
> crawls when zoomed in on Sagittarius.<br>
><br>
</div>Without. That's valid point. Also how frequently do we need to update horizontal<br>
coordinates? For every star in memory on each time step? If so it's<br>
huge time sink<br>
too.<br>
<br>
Another advantage of quaternion approach is immutability. We do not<br>
need to modify<br>
coordinates of star except possibly to account for proper motion. Code<br>
shall become<br>
simpler too<br>
<div class="im"><br>
<br>
>> Furthermore we can get ~10x performance boost (uneducated guess) by changing<br>
>> representation of sky point. Currently it's represented by two angles and<br>
>> conversions between different coordinate systems are quite costly: 5 or 6 calls<br>
>> to trigonometry functions.<br>
>><br>
>> Much more convenient scheme is to store points as 3D vectors with unit norm and<br>
>> some flag to distinguish between coordinate systems. In this case<br>
>> transformations between different coordinate systems could be done using<br>
>> quaternions and are cheap (15 multiplications). So there are no reason to cache<br>
>> horizontal coordinates, they could be recalculated on the fly if desired. Most<br>
>> of the projections also become cheaper since they don't involve trigonometry in<br>
>> this representation.<br>
>><br>
>><br>
>> This has been discussed on mail list before. You can search using "quaternion"<br>
>> keyword<br>
><br>
> Yeah, quaternions are certainly a good idea. Not sure Henry can fit it<br>
> into his time-line?<br>
><br>
</div>In my opinition it absolutely must be fitted there. If there isn't enough time<br>
drop OpenCL part. Reasons are simple:<br>
<br>
1. We are going to change representation of stars/deep-skyes/whatever<br>
anyway. Then we should change it to the most efficient one.<br>
<br>
2. It's possible to render sky on CPU with LOT of start. Other people did<br>
just that. So we should try to get good CPU performance first in order<br>
to avoid penalizing people which couldn't use GPU for whatever<br>
reason.<br>
<br>
3. It's not clear that processing on GPU is clear win. Sure even low end<br>
GPUs are order of magnitude faster. But... if workload maps on execution<br>
scheme of GPU nicely if we won't saturate bus if any other unforeseen<br>
problem won't surface.<br>
<br>
4. We could probably hope for 10-20x speedup in ideal case. If we can<br>
get similar speedup by using right algorithms we should do this. If<br>
this isn't enough then we need to get big hammer (GPU in this case)<br>
_______________________________________________<br>
Kstars-devel mailing list<br>
<a href="mailto:Kstars-devel@kde.org">Kstars-devel@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/kstars-devel" target="_blank">https://mail.kde.org/mailman/listinfo/kstars-devel</a><br>
</blockquote></div><br></div>