<div class="gmail_quote">On Thu, Sep 6, 2012 at 5:16 AM, Sven Langkamp <span dir="ltr"><<a href="mailto:sven.langkamp@gmail.com" target="_blank">sven.langkamp@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<div><br></div><div><br></div><div>I have pushed the krita-vc-langkamp branch. It does implement the brush mask computation with vc. </div><div>To build the branch you need a cpu with SSE2 and Vc 0.61 or git: <a href="http://code.compeng.uni-frankfurt.de/projects/vc" target="_blank">http://code.compeng.uni-frankfurt.de/projects/vc</a></div>


<div>AVX should work too, but requires a change in the Krita cmake file.</div><div><br></div><div>With the branch there is around 15x speedup in the mask benchmark and about 2-3x speedup in the stroke benchmark (for brushes without random and density).</div>


<div>No idea how much performance improvement that gives during real painting, so that needs testing.</div></blockquote><div><br></div><div>Small status update:</div><div><br></div><div>Branch has been tested on half a dozen systems now. Results were from twice as fast to very slight improvement/no change noticeable. Not sure why there is such a difference between systems. Dual-core systems seem to have a bigger improvement. Might be that it was mask processing was already quite fast on quad-core cpus before.</div>

<div><br></div><div>Branch is almost feature complete, just some improvements for detecting cmake files needed. Also will need some ifdefs if vc should stay an optional dependency.</div><div><br></div><div>I did some further profiling with callgrind on some 1000px 0.04 spacing. Callgrind file can be found here: <a href="http://depot.tu-dortmund.de/get/ybukq" style="font-family:Arial,Helvetica,sans-serif;font-size:14px;background-color:rgb(255,255,255)">http://depot.tu-dortmund.de/get/ybukq</a><span style="font-family:Arial,Helvetica,sans-serif;font-size:14px;background-color:rgb(255,255,255)"> </span></div>

<div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px"><br></span></font></div><div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px">It shows that the composite op is now the most expensive operation in the KisStrokeBenchmark. Which is probably also the reason that we don't see bigger improvements from the mask processing. Pentalis wants to look at the composite ops and see what can be done there. I'm considering to parallelize the fixedBlt with QtConcurrent like we already have for the brush mask.</span></font></div>

<div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px"><br></span></font></div><div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px">Beside that callgrind shows some other smaller bottlenecks. One is QVector::fill which is used by the initialize of the fixed paintdevice. Might be possible to save that by using uninitialized values and just resize.</span></font></div>

<div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px"><br></span></font></div><div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px">Another smaller bottleneck appears to be the memcpy, we do to set the color of the dab. When we use a plain color that never changes, it might be possible to avoid that as we only change the alpha values.</span></font></div>

<div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px"><br></span></font></div><div><font face="Arial, Helvetica, sans-serif"><span style="font-size:14px">Unfortunately the benchmarks don't show the other operation done while painting in Krita. So I can't say how much effect e.g. update of the projection/canvas has.</span></font></div>

</div>