<div class="gmail_quote">On Fri, Sep 28, 2012 at 3:39 PM, Dmitry Kazakov <span dir="ltr"><<a href="mailto:dimula73@gmail.com" target="_blank">dimula73@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi!<br><br>After that long discussion about grayscale selections I decided to check whether we really need planar channels for implementing the vectorization in Krita. And it turned out, that we need *not* do it. The SIMD instructions cannot work with bytes directly (we won't be able to multiply anything), so in both of the cases, when we use planar bytes and not, we will have to convert the pixel data into some other format: single precision float or single word integer, doing some inevitable permutations and wasting time on them. The flat channels will give us no help with it.<br>


</blockquote><div><br></div><div>Really interesting solution. My idea was to shuffle the alpha (that would require less converts, but more other instructions) from the loaded pixel but this looks better. Unfortunately I don't have a cpu that has avx, so I can't test it. Would be interesting how this performs with SSE and integers instead of floats.</div>


<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">What we really need to do is just to use the advantages of RGBA pixel layout (better data locality and good alignment) and optimize our code. As a proof of concept, I've written a small benchmark, that compares our standard integer COMPOSITE_OVER algorithm against its SIMD (avx) implementation. The streamed implementation showed a 3.3 times better speed than the algorithm we use right now. More than that, this sketch was written in just a day so it has lots of possibilities for optimization (it can be modified to process 10.6 pixels at a time instead of 8, for example).<br>


<br>The actual results of composing of 32 MPixels:<br><br><font><span style="font-family:courier new,monospace">TestAvxCompositeOverTest::testPerPixelComposition():  370 msecs</span><br style="font-family:courier new,monospace">


<span style="font-family:courier new,monospace">TestAvxCompositeOverTest::testAVXComposition():      147 msecs</span><br style="font-family:courier new,monospace"><span style="font-family:courier new,monospace">TestAvxCompositeOverTest::testAVXCompositionx2():    113 msecs</span></font><br style="font-family:courier new,monospace">


<br>What I want to tell with this mail:<br>1) There is no need to port the whole Krita to use some other channel layouts. Even current layout gives us lots of possibilities to optimize our code.<br></blockquote><div><br>


</div><div>Maybe it would be a good idea to give some time on the action plan to this. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">2) We still need to decide what to do with grayscale selections.<br>

</blockquote><div><br></div><div>My favorite is still the composite op solution. </div></div><br>