Fwd: Python / C++ Binding Layer using PyBind11

Tue Apr 18 11:50:25 BST 2023

Hi,

I've been looking at ways we can improve facial recognition.
Unfortunately, it looks like OpenCV is increasingly becoming the
limiting factor for adding new models, with newer pipelines/models
being implemented in Python + TF / Torch.

Looking at previous GSOC reports
(https://community.kde.org/GSoC/2021/StatusReports/NghiaDuong) I can
see the approach taken was to adapt model(s) and workflows into
OpenCV.
However, this places a high implementation burden on contributors who have to:
- port or re-implement existing pre-processing pipelines from Python in C++
- hope the OpenCV DNN is compatible with the actual model (and port it)
- re-evaluate the performance to ensure that the performance hasn't degraded.

I understand this is a C++ project, so I'd like to propose the idea of
using something like PyBind11 with Python packages cloned as plugins
(similarly to how YOLOv3 is installed on first run) to allow us to
quickly add / evaluate new models as they come out and add them to the
application.

I'm happy to write the binding layer based on these techniques:
https://pybind11.readthedocs.io/en/stable/advanced/pycpp/object.html#accessing-python-libraries-from-c

With something like deepface (https://github.com/serengil/deepface)
our C++ layer would call the following Python code:

####
# Python starting reference

from deepface import DeepFace
dfs = DeepFace.represent(img_path = "img1.jpg")

/* ###### */
// C++ Pybind equivalent
auto deepFace = py::module_::import("deepface").attr("DeepFace");
auto deepFind = deepFace.attr("represent");

py::list faces = deepfind(img_path = "img1.jpg")

for (auto i: faces){
    // Unpack embedding, face position and confidence
    // All comparisons (i.e. distance...etc.) is now done in digiKam
}

In my opinion, this should not be a completely generic Python binding
layer. That is a maintenance burden for developers and opens a whole
line of support problems.
Instead, we should write bespoke bindings as/when people want them,
and instead consider the amount of boilerplate required when we pick
libraries and packages too (hence I selected deep face as an example).

Benefits:

- Can quickly add / test new models by writing a small binding layer
- Improved accuracy, with the rapid iteration we're seeing in the ML
space currently.
- Possibility of GPU support long term, through model selection and
letting Tensorflow / PyTorch runtimes handle these devices

Downsides:

- Requires us to use Python 3.6+. We can add a flag in CMake so
downstream vendors can opt-in to this dependency whilst compiling
- IMO: We would probably need to embed Python for Windows builds
(where we can't guarantee the version), maybe as a separate build?
- Performance overhead from using Python
  (Counterpoint: Users have to opt into this, and for a lot of people
accuracy > speed for the one-time classification of an existing
library. I know I'd rather leave it running for 2 days than sit
manually classifying for 2 hours)
- Need to be careful about GIL and Qt blocking. I think a seperate
thread (or equivalent) will be the correct place to hold the GIL so we
can't get the main window blocked

Without getting too bogged down in model selection at this point I
wanted to ask:

- Do you have any thoughts / concerns / improvements about the above proposal?
- Is this something you'd be happy to accept?

I'm happy to write CMake Support and an example POC in C++ (something
that literally reads hello world from Python as a string, then
de-serialises it back into C++ strings) as time allows on my side to
prove PyBind11.
I'm also happy, longer term, to port my Python implementation which
uses DeepFace to populate a DB of fingerprints which we perform
distance matching against. I've seen significantly better results with
this model compared to digiKam in my testing too.

Thanks,
David