Evaluation of DPC++ compiler for Krita to support GPU computations

Thu Jan 5 13:21:40 GMT 2023

Hey Dmitry,

A few remarks on what you found:

On 04/01/2023 10:55, Dmitry Kazakov wrote:
> Hi, all!
> 
> The last weeks before the New Year I spent trying to build Intel's DPC++
> compiler. As far as I understand, this beast is something like our
> 'xsimd' library, but for offloading work to GPUs. Here I would like to
> share what I learned about it :)
> 
> tldr; my opinion about DPC++ is very positive, though we would have to
> invest a lot of time into that; on a positive side, we will be able to
> share some code with our XSIMD implementation.
> 
> That is what I learned in the process:
> 
> 1) Intel DPC++ is a flavour of a normal C++ that allows automatic
> offloading of the code to the GPU. Basically, you write a normal C++
> code, then pass it to a special 'Queue' class as a lambda or a function
> pointer and the rest is done by the compiler automatically. The compiler
> can either compile it into intermediate representation (SPIR-V), which
> will later be compiled into GPU binary on the user's PC by the GPU
> driver, or precompile it directly into your target GPUs' binary code.
> This approach looks very nice, because we can reuse our existing
> composition/brush code written in C++ inside these GPU routines, which
> will reduce maintenance burden a lot.

SPIR-V is (if I recall correctly) Vulkan, which will require the user to
use a somewhat updated GPU driver. Can you confirm if this is the case,
and if not, whether DPC++ will supply the relevant library?

> 
> 2) There is also a library called oneAPI. It is built on the top of that
> DPC++ compiler. We can use it to optimize the Gaussian Blur and other
> filters, but I don't think we can use it for brushes and composition.

Have you researched this in depth? I've not heard of a profiling of our
filters, so I lack the knowledge to ascertain where our current
bottlenecks are in that alley.

> 
> 3) Since DPC++ is an extension of C++, we should use a custom compiler
> for that. Basically, we should switch to an unstable branch of Clang
> spiced with Intel's patches. It sounds a little bit scary :)
> 
> 4) As far as I can tell, DPC++ is supported only on Linux and Windows.
> I'm not sure we can use it on Android or MacOS.

I think we should address the elephant in the room: is this API x86 only
or does it support Arm users out there as well?

> 
> 5) Not only will we have to switch to an unstable branch of Clang, we
> will also have to build the compiler ourselves (at least on Windows).
> Official builds support only MSVC, but we need a MinGW environment.
> 

FTR, we use MinGW due to the performance optimizations; remember I use
MSVC as a daily Krita driver so it's not strictly needed.

> 6) I have managed to compile and run this compiler with MinGW, but this
> process is extremely manual and flanky right now. More work will have to
> be done for that. Most probably, we will have to do cross-compilation
> from Linux, actually :)

Could you elaborate on this? Does it rely on shell scripts or (woe
betide us) the Linux process forking model for speed, à la building LLVM?

> 
> 7) The whole idea of DPC++ is really good. We write code in C++ and the
> compiler automatically builds it for all the available GPU architectures
> (with a limited C runtime). It means that we can simply reuse our brush
> and composition code (including the XSIMD one) inside these DPC++ blocks
> without duplicates. When I tested CUDA in 201x, My main concern was that
> we would have to write the second copy of all our rendering code to use
> it. DPC++ somewhat solves this issue.

Qt 6 *still* hasn't returned ANGLE, so such an upgrade would still
involve writing at the very least three such copies (OpenGL, DirectX,
OpenGL ES) to maintain HDR support.

> -- 
> Dmitry Kazakov

-- 
amyspark 🌸 https://www.amyspark.me