Pre-computing and saving trigonometric values reduced ~45% overhead in star position computation

Akarsh Simha akarshsimha at gmail.com
Thu Sep 29 04:50:06 UTC 2016


Hi

Here's a brief explanation of what the recent commits in trunk have effected:

We now have a new dms subclass called CachingDms. CachingDms extends
dms by reimplementing the setters (dms::setD etc.) so that they
calculate cos() and sin() of the angle every time the angle is
changed. Why dos this help? Because profiling showed that:
1. Trigonometric computations are the main bottleneck in the core
computations of KStars
2. Under some conditions, as much as 60% of the trigonometric calls
made were redundant, i.e. they were recomputing sin() or cos() of the
same angle that had already been computed.
I appreciate that CachingDms might not be the best name for this
class, because it's not really caching as much as precomputing;
anyway...

Here are some details about how CachingDms works:

1. If we set an angle, compute its trigonometric values immediately.

2. sin(), cos() and SinCos() are merely inline getters of the
pre-computed values

3. If we are going to compute the angle from the sine, cosine or
tangent using asin, acos, atan2, instead call
CachingDms::setUsing_asin, setUsing_acos and setUsing_atan2. These
methods use the fact that we already know one trigonometric function
to find the other(s) using simpler algebraic operations. For example,
the code of setUsing_asin() does the following:
1) Save the given sine in the sine cache
2) Compute the angle using std::asin()
3) Compute the cosine by a simple square root as = sqrt(1 - sine^2)
This is more efficient than doing
angle = asin(x), cosAngle = cos(angle) because the square root is more
efficient. Moreover, the sine value is remember and saved for future
computations for "free".

4. CachingDms provides binary +, binary - and unary - operators. They
use trigonometric identities
cos(A±B) = cos A cos B ± ( - sin A sin B ) and sin(A±B) = sin A cos B
± cos A sin B
sin(-A) = - sin A, cos(-A) = cos A
to save the trigonometric values for A ± B and -A without calling
triogonometric functions.

5. Reducing an angle to a range [0, 2 pi) or (-pi, pi] is effected by
calling dms::reduceToRange(). CachingDms plainly inherits this method,
so the angles are not recomputed.

CachingDms caveats:

1. There aren't sufficient safeguards to throw off the angle and its
trigonometric values out of sync. The setters are virtual, and
hopefully, that should suffice, but there could be problems not known
to me yet. If you find any, please write test cases for them in the
TestCachingDms class.
2. CachingDms can degrade performance if we never use the cached
trigonometric values. With the current code, this happens in MANY
places outside the most expensive computation loops, but I have not
bothered to trace them down and fix them. For example, the planet code
probably has degraded from these commits, but there are only few
planets, and most asteroids are too faint to be shown. Refactoring
code keeping in mind the 5 points above should fix most of these "bad
cache uses". To find out more, use the profiling code that is enabled
by a #define in dms.h -- the bad cache uses are counted under
CachingDms::cachingdms_bad_uses static member.

Other improvements:

Apart from the improvements offered by CachingDms, few other minor
improvements were made in SkyPoint methods to shave off a few more CPU
cycles. These included things like refactoring some formulas to reduce
the number of multiplications, storing intermediate computations for
re-use...

Results of profiling:

Anywhere between 2 ~ 6 redundant trigonometric computations were
removed from EquatorialToHorizontal, precess, aberrate, nutate
methods.

The most expensive methods called from DeepStarComponent::draw() which
draws our 100 million-star catalog are StarObject::JITupdate() which
updates the coordinates of the star, and SkyQPainter::drawPointSource
which does the projection and draws the star. I have not touched the
latter part (there is some scope for optimization there as well).

StarObject::JITupdate() has gone down from average (under certain
configuration) 3784 cycles (as estimated by callgrind) to 1840 cycles,
which is about a 50% reduction in computation.
SkyPoint::EquatorialToHorizontal has gone down from 992 CPU cycles
(estimated by callgrind) to 272 cycles (70% reduction). This is the
primary bottleneck when the simulation clock is running at 1
simulation second/second and the map is not being panned / zoomed in.
SkyPoint::updateCoords(), which handles precession, nutation,
aberration and gravitational lensing around the sun is down from 2527
cycles to 1415 cycles. The particular subroutines of precession,
aberration and nutation are down from 790, 609 and 1053 cycles to 407,
471, 462 cycles.

Under my settings, there were only 10 bad cache uses per call of
DeepStarComponent::draw(), although there are more in other places in
KStars.

Scope for future optimizations:

The following things might help, but they are somewhat extensive
undertakings that I might consider in the future. Otherwise, they are
up for anyone to work on:

1. Making the internal representation of dms to use radians instead of
degrees might provide some speed by reducing the overhead of
back-and-forth conversion.

2. Making the azimuth caching -- this will help reduce the overhead
when projection, since we already know some trigonometric values of
the azimuth.

My estimation is that these might help to shave off some 150 ~ 200 CPU
cycles off from JITupdate() and drawPointSource() put together.

Regards
Akarsh


More information about the Kstars-devel mailing list