Krita Composition and CUDA

Thu Apr 25 06:45:53 UTC 2013

Hello,

this page has some interesting information, it might be also
interesting for us in Krita context:
https://wiki.postgresql.org/wiki/PGStrom

Lukas

2012/11/4 Dmitry Kazakov <dimula73 at gmail.com>:
> Hi!
>
> Yesterday I played with CUDA a bit (just for fun, not for the sponsored
> work) and I wanted to share my ideas about it.
>
> First Impression
>
> The first thing to say: "CUDA is a really nice thing!". I have a very
> low-end GPU (GT610) with only 48 cores, but it performs pure calculations
> (not counting data transfers) almost twice faster than the ex-top Intel Core
> i7 CPU can do with its AVX extension. I'm afraid even to think about how
> these operations are executed on high-end GPUs with 500+ cores.
>
> The Test
>
> In the attachment you can find a table that compares the speed of various
> implementations of Composite Over. The test consisted of a composition of a
> single buffer containing 32 million random pixels (about 122MiB) into the
> similar buffer using a mask.
>
> Results
>
> It's a pity, but in real life we don't get much improvement by using the GPU
> this way. Although the calculations are performed almost twice as fast as
> the CPU can, the benefit is neglected by delays we get due to data transfers
> between CPU and GPU. According to the tests, these transfers may take up to
> 50% of the time for the Composite Over.
>
> It is quite convenient to measure the results in the Memcpy's time. In our
> case the data transfers take 5.36 memcpy time. We transfer about 396 MiB,
> which means that a single data copy to/from GPU is about 1.5 times slower
> than a usual RAM-to-RAM copy.
>
> Taking into account that some fast paths of Vc implementation of this Op
> take about 1...3 memcpy time, this approach (with at least 5.36 memcpy time)
> will not work for us.
>
> Idea
>
> Well, its obvious that the bottleneck of this approach is data transfers. So
> we should avoid them somehow. What if we moved the storage of our layers
> from cpu to gpu memory? Of course not completely. All the layers should be
> stored at the CPU RAM, but some of them (say, the active one), would have a
> full copy at the GPU RAM. It means that the paintops and the composition can
> be performed completely on a gpu without the need of data transfers. This
> would give 2 times performance gain on low-end GPUs (as mine with 48 cores)
> and I can't even say how fast it would run on 500+ cores high-end GPUs.
>
> What is more, if the projection of the image was stored in the GPU memory,
> we would avoid one data transfer KisPaintDevice->QImage->OpenGL Texture. The
> point is, the OpenGL textures can be linked directly to the CUDA buffers, so
> the projection would be written directly to the texture. Just for you know,
> according to my last profiling, at the moment we spend 13.8% of time on this
> transfer.
>
> Of course, this idea sounds like a dream and, of course, there are lots of
> complications hidden. But I guess, we need to think about it at least. It
> might be a quite interesting, though huge and difficult project for GSoC,
> for example...
>
>
> --
> Dmitry Kazakov