Evaluation of DPC++ compiler for Krita to support GPU computations

Wed Jan 4 13:55:42 GMT 2023

Hi, all!

The last weeks before the New Year I spent trying to build Intel's DPC++
compiler. As far as I understand, this beast is something like our 'xsimd'
library, but for offloading work to GPUs. Here I would like to share what I
learned about it :)

tldr; my opinion about DPC++ is very positive, though we would have to
invest a lot of time into that; on a positive side, we will be able to
share some code with our XSIMD implementation.

That is what I learned in the process:

1) Intel DPC++ is a flavour of a normal C++ that allows automatic
offloading of the code to the GPU. Basically, you write a normal C++ code,
then pass it to a special 'Queue' class as a lambda or a function pointer
and the rest is done by the compiler automatically. The compiler can either
compile it into intermediate representation (SPIR-V), which will later be
compiled into GPU binary on the user's PC by the GPU driver, or precompile
it directly into your target GPUs' binary code. This approach looks very
nice, because we can reuse our existing composition/brush code written in
C++ inside these GPU routines, which will reduce maintenance burden a lot.

2) There is also a library called oneAPI. It is built on the top of that
DPC++ compiler. We can use it to optimize the Gaussian Blur and other
filters, but I don't think we can use it for brushes and composition.

3) Since DPC++ is an extension of C++, we should use a custom compiler for
that. Basically, we should switch to an unstable branch of Clang spiced
with Intel's patches. It sounds a little bit scary :)

4) As far as I can tell, DPC++ is supported only on Linux and Windows. I'm
not sure we can use it on Android or MacOS.

5) Not only will we have to switch to an unstable branch of Clang, we will
also have to build the compiler ourselves (at least on Windows). Official
builds support only MSVC, but we need a MinGW environment.

6) I have managed to compile and run this compiler with MinGW, but this
process is extremely manual and flanky right now. More work will have to be
done for that. Most probably, we will have to do cross-compilation from
Linux, actually :)

7) The whole idea of DPC++ is really good. We write code in C++ and the
compiler automatically builds it for all the available GPU architectures
(with a limited C runtime). It means that we can simply reuse our brush and
composition code (including the XSIMD one) inside these DPC++ blocks
without duplicates. When I tested CUDA in 201x, My main concern was that we
would have to write the second copy of all our rendering code to use it.
DPC++ somewhat solves this issue.

-- 
Dmitry Kazakov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kimageshop/attachments/20230104/3bf7aa8c/attachment.htm>