<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
So I'm starting this discussion to point out what can be done, what is slow,<br>
how you see the problem?<br></blockquote><div><br>How i see it:<br><br>1) I wouldn't say it's a tile engine internal fault that Krita is slow. I'd even say that it's not tile engine's fault at all (mostly). =)<br>
<br>Most problems come from <i>misusing</i> the engine. And the biggest problem here is bitBlt'ing at higher level of abstraction (colorspaces). We used to discuss it on irc, as i remember. Tile engine (at least new one) supports implicit tile sharing and lazy copying, BUT higher layers of abstractions do not use it at all! They must use it when source and destination colorspaces coincide, it'll be much faster than direct copying of data stride by stride.<br>
<br>But to solve this issue we need to design a good programming interface for it. <br>The implementation will not be that difficult, i can do this as soon we discuss the interface. We could discuss it on sprint?<br><br><br>
2) To make bitBlt'ing more faster we should solve another one issue: compositeOps constants in pigment.<br><br>I don't know how it happened, but all the compositeOp constants (like COMPOSITE_OVER) are strings! And for every stride of the image (there might be more than 1000 strides per image) we do a string comparison! More than that, we do many string comparisons on every composition! See [1].<br>
<br>I don't know where COMPOSITE_OVER and COMPOSITE_ALPHA_DARKEN are implemented, maybe their application is optimized a bit, but most of the others should suffer from this comparison.<br><br>Of course, Qt may smooth this comparison procedure a bit with shared internal data, but nevertheless integer switch should be much faster than that.<br>
<br>What do i suggest?<br>Change all these constants' type to qint32 and create separate function that would show their name using integers. This chain of if() constrictions in [1] could be replaced with a simple 'switch' construction. Theoretically, a compiler can optimize this switch with a jump table at asm level.<br>
<br>3) Slow work on dual-core processors is connected with the fact that there almost no threading in Krita at the moment. =) Yes, we have KisImageUpdater but it does almost nothing for speed, because while it works other parts of Krita (e.g. KisView) are waiting for his work finished. It means that Krita is almost single-threaded: while one thread is working, others - are waiting. I saw it when i started a system monitor in parallel with Krita, it showed that only one core is working at the same moment (i tested it about a month ago).<br>
<br>I'm working on a layers merging parallelization right now. The same algorithm can (and should) be used in KisView for pre-scaling.<br><br><br>4) openGL is good, but i think we could use processor's capabilities like SSE and friends first.<br>
<br><br>[1] - libs/pigment/colorspaces/KoRgbU8CompositeOp.cpp:65<br><br></div></div><br><br>PS:<br>I don't want to say the tile engine shouldn't be optimized itself, i mean it's API should be optimized first.<br>
<br>-- <br>Dmitry Kazakov<br>