Krita useable for Blender movies

Fri Oct 23 14:45:32 CEST 2009

On Friday 23 October 2009, Dmitry Kazakov wrote:

> 1) I wouldn't say it's a tile engine internal fault that Krita is slow. I'd
> even say that it's not tile engine's fault at all (mostly). =)

Depends a bit on which tile engine you're talking about -- the currently 
active one has that nasty lock that kicks in whenever a tile is accessed. This 
effectively serializes everything in Krita -- see the mutrace output I posted 
a while ago. The second one doesn't support swapping, so it's not comparable.

And I am sure there is a lot left to improve for both tile engines, if even 
the Gimp people found some serious issues in their decades old tile engine 
this year.

Right now, the easiest win we can make in the tile engine is to make the 
iterators cache the tiles they are accessing in a row or column. That's 
something that has been borne out by proper profiling.

> Most problems come from *misusing* the engine. And the biggest problem here
> is bitBlt'ing at higher level of abstraction (colorspaces). We used to
> discuss it on irc, as i remember. Tile engine (at least new one) supports
> implicit tile sharing and lazy copying, BUT higher layers of abstractions
>  do not use it at all! They must use it when source and destination
>  colorspaces coincide, it'll be much faster than direct copying of data
>  stride by stride.
> 
> But to solve this issue we need to design a good programming interface for
> it.
> The implementation will not be that difficult, i can do this as soon we
> discuss the interface. We could discuss it on sprint?
> 
> 
> 2) To make bitBlt'ing more faster we should solve another one issue:
> compositeOps constants in pigment.
> 
> I don't know how it happened, but all the compositeOp constants (like
> COMPOSITE_OVER) are strings! And for every stride of the image (there might
> be more than 1000 strides per image) we do a string comparison! More than
> that, we do many string comparisons on every composition! See [1].

No, we don't. We get a KoCompositeOp subclass instance and apply that. It's 
easy enough to see that we don't do all those string comparisons by running 
valgrind.

> I don't know where COMPOSITE_OVER and COMPOSITE_ALPHA_DARKEN are
> implemented, maybe their application is optimized a bit, but most of the
> others should suffer from this comparison.

For rgbu8, ADD, ALPHA_DARKEN, BURN, DIVIDE, DODGE, ERASE, MULTIPLY, OVER, 
OVERLAY, SCREEN, SUBTRACT are implemented as separate, templated classes.

COMPOSITE_DARKEN, COMPOSITE_LIGHTEN,COMPOSITE_HUE,  
COMPOSITE_SATURATIONCOMPOSITE_VALUE, COMPOSITE_COLOR, COMPOSITE_IN 
COMPOSITE_OUT,  COMPOSITE_ADD,   COMPOSITE_DIFF COMPOSITE_BUMPMAP,   
COMPOSITE_CLEAR, COMPOSITE_DISSOLVE are implemented in this legacy code. 

Other colorspaces do not have the legacy code path.

> Of course, Qt may smooth this comparison procedure a bit with shared
> internal data, but nevertheless integer switch should be much faster than
> that.

It is optimized a lot in Qt -- and don't forget that these are const's. But, 
as Cyrille has said, it's old code.

> 
> What do i suggest?
> Change all these constants' type to qint32 and create separate function
>  that would show their name using integers. This chain of if()
>  constrictions in [1] could be replaced with a simple 'switch'
>  construction. Theoretically, a compiler can optimize this switch with a
>  jump table at asm level.

That's not necessary because we've already implemented something like Thomas 
Zander suggested three years ago. We just haven't had the time to port all 
code.

> 3) Slow work on dual-core processors is connected with the fact that there
> almost no threading in Krita at the moment. =) Yes, we have KisImageUpdater
> but it does almost nothing for speed, because while it works other parts of
> Krita (e.g. KisView) are waiting for his work finished.

KisView is not waiting in the sense that it blocks until the image is 
refreshed. The gui is kept responsive at all times because the recomputation 
isn't in done in the gui thread. Note that actually painting, i.e., making 
strokes, also involves running a thread, which means there are 3 threads 
running already while painting.

> It means that Krita
> is almost single-threaded: while one thread is working, others - are
> waiting. I saw it when i started a system monitor in parallel with Krita,
>  it showed that only one core is working at the same moment (i tested it
>  about a month ago).

When painting with a big brush (~500) pixels, kysguard shows that both cores 
are actually saturated. One is recompositing, the other is computing the 500 
pixel mask for every dab (which is the biggest measurable performance issue we 
have).

> I'm working on a layers merging parallelization right now. The same
> algorithm can (and should) be used in KisView for pre-scaling.

That sounds cool! But please -- no enormous code drops, make sure we have a 
manageable set of small patches for review!

> 4) openGL is good, but i think we could use processor's capabilities like
> SSE and friends first.

I'm vacillating on this... sse etc. are nice, but most distributors do not 
enable them by default, except for the 64 bits systems. OpenGL + glsl has
the potential to give much more performance, but I haven't got a system where 
I can use it. And if we can move more and more work to opengtl, we can profit 
from the automatic optimization it gives us. 

-- 
Boudewijn Rempt | http://www.valdyas.org