<div dir="ltr">On Tue, May 21, 2013 at 9:14 PM, Boudewijn Rempt <span dir="ltr"><<a href="mailto:boud@valdyas.org" target="_blank">boud@valdyas.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Tuesday 21 May 2013 May 21:08:19 Sven Langkamp wrote:<br>

<br>

><br>

> I think we need to do something significantly different to get to this<br>

> performance level.<br>

<br>

</div>Yes, that is my fear as well.<br>

<div class="im"><br>

> I don't think the current approach can<br>

> be optimized enough to do it. We discussed that on the last sprint and we<br>

> had no idea what Photoshop is doing. In meantime they are going for GPU<br>

> support like Mari.<br>

<br>

</div>There must also be something with deferred processing or something like that. There's no way one is going to get a 14032x9632 in 32f in gpu memory, not even on a quadro 400 and still have space for something else...<br>

</blockquote><div><br></div><div style>Newer nvidia cards have up to 6 GB of graphics memory, so it should fit. </div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


I've been experimenting a bit, and an ordinary layerstack (a4, 300dpi) with a dozen layers works fine on my intel gpu and a little bit of graphcis memory, though.<br>

<br>

I want to experiment some more here -- but the trick is arriving at<br>

<br>

a) a good, extensible technology choice (glsl? cuda? opencl? whatever?)<br>

b) a good, extensible design<br>

c) something that krita can grow into, because we don't want to do full rewrites (I hope...)</blockquote><div><br></div><div style>a) I would go with OpenCL. CUDA is nvidia only and glsl is a bit ugly for this purpose.</div>

<div style><br></div><div style>b) and c) is very tricky. There is the most simple way were you write the content of the layer into a buffer and push that to the graphics card, process it and get it back. This can be done without many rewrites, but you get a memory transfer bottleneck. This would give us a speedup of very computationally intense calculation, but not much or even negative on memory bounds ones.</div>

<div style><br></div><div style>The other option is to have all the tiles on the GPU, but that sort of difficult to integrate with the current codebase.</div></div></div></div>