<br><br><div class="gmail_quote">On Fri, Oct 23, 2009 at 4:45 PM, Boudewijn Rempt <span dir="ltr">&lt;<a href="mailto:boud@valdyas.org">boud@valdyas.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im">On Friday 23 October 2009, Dmitry Kazakov wrote:<br>

<br>

&gt; 1) I wouldn&#39;t say it&#39;s a tile engine internal fault that Krita is slow. I&#39;d<br>

&gt; even say that it&#39;s not tile engine&#39;s fault at all (mostly). =)<br>

<br>

</div>Depends a bit on which tile engine you&#39;re talking about -- the currently<br>

active one has that nasty lock that kicks in whenever a tile is accessed. This<br>

effectively serializes everything in Krita </blockquote><div><br>I wouldn&#39;t say so =)<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

-- see the mutrace output I posted<br>

a while ago. The second one doesn&#39;t support swapping, so it&#39;s not comparable.<br>

<br>

And I am sure there is a lot left to improve for both tile engines, if even<br>

the Gimp people found some serious issues in their decades old tile engine<br>

this year.<br></blockquote><div><br>Of course we should optimize it! I mean that we should design an &quot;optimal&quot; API first! ;)<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Right now, the easiest win we can make in the tile engine is to make the<br>

iterators cache the tiles they are accessing in a row or column. That&#39;s<br>

something that has been borne out by proper profiling.<br></blockquote><div><br>Yes, it could be done.<br><br> </div><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im">

&gt; Most problems come from *misusing* the engine. And the biggest problem here<br>

&gt; is bitBlt&#39;ing at higher level of abstraction (colorspaces). We used to<br>

&gt; discuss it on irc, as i remember. Tile engine (at least new one) supports<br>

&gt; implicit tile sharing and lazy copying, BUT higher layers of abstractions<br>

&gt;  do not use it at all! They must use it when source and destination<br>

&gt;  colorspaces coincide, it&#39;ll be much faster than direct copying of data<br>

&gt;  stride by stride.<br>

&gt;<br>

&gt; But to solve this issue we need to design a good programming interface for<br>

&gt; it.<br>

&gt; The implementation will not be that difficult, i can do this as soon we<br>

&gt; discuss the interface. We could discuss it on sprint?<br>

&gt;<br>

&gt;<br>

&gt; 2) To make bitBlt&#39;ing more faster we should solve another one issue:<br>

&gt; compositeOps constants in pigment.<br>

&gt;<br>

&gt; I don&#39;t know how it happened, but all the compositeOp constants (like<br>

&gt; COMPOSITE_OVER) are strings! And for every stride of the image (there might<br>

&gt; be more than 1000 strides per image) we do a string comparison! More than<br>

&gt; that, we do many string comparisons on every composition! See [1].<br>

<br>

</div>No, we don&#39;t. We get a KoCompositeOp subclass instance and apply that. It&#39;s<br>

easy enough to see that we don&#39;t do all those string comparisons by running<br>

valgrind.<br>

<div class="im"><br>

&gt; I don&#39;t know where COMPOSITE_OVER and COMPOSITE_ALPHA_DARKEN are<br>

&gt; implemented, maybe their application is optimized a bit, but most of the<br>

&gt; others should suffer from this comparison.<br>

<br>

</div>For rgbu8, ADD, ALPHA_DARKEN, BURN, DIVIDE, DODGE, ERASE, MULTIPLY, OVER,<br>

OVERLAY, SCREEN, SUBTRACT are implemented as separate, templated classes.<br>

<br>

COMPOSITE_DARKEN, COMPOSITE_LIGHTEN,COMPOSITE_HUE,<br>

COMPOSITE_SATURATIONCOMPOSITE_VALUE, COMPOSITE_COLOR, COMPOSITE_IN<br>

COMPOSITE_OUT,  COMPOSITE_ADD,   COMPOSITE_DIFF COMPOSITE_BUMPMAP,<br>

COMPOSITE_CLEAR, COMPOSITE_DISSOLVE are implemented in this legacy code.<br>

<br>

Other colorspaces do not have the legacy code path.<br>

<div class="im"><br>

&gt; Of course, Qt may smooth this comparison procedure a bit with shared<br>

&gt; internal data, but nevertheless integer switch should be much faster than<br>

&gt; that.<br>

<br>

</div>It is optimized a lot in Qt -- and don&#39;t forget that these are const&#39;s. But,<br>

as Cyrille has said, it&#39;s old code.<br>

<div class="im"><br>

&gt;<br>

&gt; What do i suggest?<br>

&gt; Change all these constants&#39; type to qint32 and create separate function<br>

&gt;  that would show their name using integers. This chain of if()<br>

&gt;  constrictions in [1] could be replaced with a simple &#39;switch&#39;<br>

&gt;  construction. Theoretically, a compiler can optimize this switch with a<br>

&gt;  jump table at asm level.<br>

<br>

</div>That&#39;s not necessary because we&#39;ve already implemented something like Thomas<br>

Zander suggested three years ago. We just haven&#39;t had the time to port all<br>

code.<br>

<div class="im"><br>

&gt; 3) Slow work on dual-core processors is connected with the fact that there<br>

&gt; almost no threading in Krita at the moment. =) Yes, we have KisImageUpdater<br>

&gt; but it does almost nothing for speed, because while it works other parts of<br>

&gt; Krita (e.g. KisView) are waiting for his work finished.<br>

<br>

</div>KisView is not waiting in the sense that it blocks until the image is<br>

refreshed. The gui is kept responsive at all times because the recomputation<br>

isn&#39;t in done in the gui thread. Note that actually painting, i.e., making<br>

strokes, also involves running a thread, which means there are 3 threads<br>

running already while painting.<br>

<div class="im"><br>

&gt; It means that Krita<br>

&gt; is almost single-threaded: while one thread is working, others - are<br>

&gt; waiting. I saw it when i started a system monitor in parallel with Krita,<br>

&gt;  it showed that only one core is working at the same moment (i tested it<br>

&gt;  about a month ago).<br>

<br>

</div>When painting with a big brush (~500) pixels, kysguard shows that both cores<br>

are actually saturated. One is recompositing, the other is computing the 500<br>

pixel mask for every dab (which is the biggest measurable performance issue we<br>

have).<br>

<div class="im"><br>

&gt; I&#39;m working on a layers merging parallelization right now. The same<br>

&gt; algorithm can (and should) be used in KisView for pre-scaling.<br>

<br>

</div>That sounds cool! But please -- no enormous code drops, make sure we have a<br>

manageable set of small patches for review!<br>

<div class="im"><br>

&gt; 4) openGL is good, but i think we could use processor&#39;s capabilities like<br>

&gt; SSE and friends first.<br>

<br>

</div>I&#39;m vacillating on this... sse etc. are nice, but most distributors do not<br>

enable them by default, except for the 64 bits systems. </blockquote><div><br>But not all systems support hardware opengl, do they?<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

OpenGL + glsl has<br>

the potential to give much more performance, but I haven&#39;t got a system where<br>

I can use it. And if we can move more and more work to opengtl, we can profit<br>

from the automatic optimization it gives us.<br>

<div class="im"><br>

<br>

--<br>

Boudewijn Rempt | <a href="http://www.valdyas.org" target="_blank">http://www.valdyas.org</a><br>

_______________________________________________<br>

</div><div><div></div><div class="h5">kimageshop mailing list<br>

<a href="mailto:kimageshop@kde.org">kimageshop@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/kimageshop" target="_blank">https://mail.kde.org/mailman/listinfo/kimageshop</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>Dmitry Kazakov<br>