[Kde-accessibility] Lip Reader Demo

Peter Grasch grasch at simon-listens.org
Sat Mar 24 18:59:58 UTC 2012


Am Freitag, 23. März 2012, 20:06:00 schrieb Yash Shah:
> Though our code is superfast, We can accelerate it even more by using GPU.
> OpenCV itself now supports GPU acceleration like OpenCL. Minimal code
> changes are required. Main building block of GPU based aplication is GpuMat
> class in contrast to Mat class in CPU OpenCV API. We can convert one into
> another and mix them in code. We will use it in our project.
Nice :)

> We can even add one more filter. We can roughly estimate the distance of
> the user from the webcam. We will have trained samples with us and
> according to that, we can even filter the sound from its loudness. I think
> the problem with the background noise will be solved by using Computer
> Vision. Surely, there will be exceptions, but it will work perfectly in
> general.
Aiming high, I like it :)

> I also though about "mmmhhh" while implementing, but it is also kind of
> noise to us. We don't have to perform anything with "mmmhhh".
As simon is supposed to be very versatile ignoring sounds that don't involve 
the lips (or in case of "m" not visibly) is sadly not enough. But as the 
vision extension will be an optional addition and not replace the current 
segmentation code, that's fine.

> We will be using libKface. Digikam also has large database of users images
> and it tags people automatically. So it also will be useful to us.
Sounds good.

Best regards,
Peter


More information about the kde-accessibility mailing list