[Kde-accessibility] [GSoC] Face Detection and Face Recognition for Simon

Tue Mar 20 19:54:10 UTC 2012

Hello everbody,

We've had a quite overwhelming response to the face detection and -recognition 
project for Simon that I feel it's time to introduce all of you to each other 
and to outline the project a bit better.

As stated on the ideas page, this project is about using computer vision (face 
detection and face recognition) to improve Simons recognition accuracy - 
mainly to avoid false positives.

During last years GSoC Simon gained a powerful foundation to take context into 
account during the recognition. Initially, this system only affects the 
scenario selection. If you build a current development version of Simon, 
you'll se a "Context" tab that allows you to set up conditions for the 
currently active scenario. The scenario will be de-activated if those 
conditions are not met (if you don't have any conditions, the scenario will 
always be active).

Use case: The Firefox scenario (i.e. "New Tab", "Next Tab", etc.) should only 
be active, when Firefox is actually running.

That system is going to be expanded to cover other areas as well. For example, 
we are planning to hook it up to the sound system to enable / disable certain 
microphones depending on the users position in the room.

Another aspect would be to use the context information to select a different 
acoustic model. For example, a different acoustic model should be used 
depending on the distance between speaker and microphone (as this affects the 
acoustic properties quite a bit).

As you can see, we plan on having more context information than "active 
window". And that information is going to be used in a lot of different places.

The face detection and face recognition should be implemented as another 
context plugin. That way it's automatically available in all the areas listed 
above. For example, we could strategically place webcams in a room to track 
the users location - and activate / deactivate microphones acording to that 
information. Or we could switch the acoustic model depending on who is 
standing in front of the microphone. All of that would be possible without any 
further work if the system is designed correctly (modular).

So what is your job over the summer?

Implement a face detection / recognition context plugin. The plugin 
infrastructure is laid out in such a way that you can present your own 
configuration screen (alowing you to take webcam pictures to get a frame of 
reference for example). However, plugins don't have a UI during "normal" 
operation. The only interface you have to the rest of Simon is a simple 
boolean return value to tell the context system if the condition is matched 
(i.e.: is "Peter" sitting in front of the system? [Y/N]).

Also, the context system is not finished but it's being worked on. The scenario 
switching does work already and can be used as a testing ground (the plugin is 
supposed to only a boolean match - for that it doesn't matter where it is 
used) Expect to see some rough edges, however, and be prepared to help out a 
little there as well if absolutely necessary.

As some people have suggested to use face recognition to do the login between 
Simon and Simond: I don't think that's such a good idea. The login between 
those two is almost exclusively a way of identifying the user (when the server 
serves multiple clients). So during normal operation, it doesn't change on the 
client at all. In multi user scenarios, users tend to have separate user 
accounts - which means each user account can store their own Simond password 
anyway.
The face recognition to switch acoustic models is mainly a way to improve 
Simons accuracy in e.g. a Kiosk environment where all users basically use the 
same stuff. It might still be helpful to use different models depending on 
gender, etc. While implementing it as a login mechanism would still enable 
that, it would get in the way of the use cases mentioned above and complicate 
the implementation unnecessarily (as the login is an entirely separete 
system).

About the used libraries: You can use opencv directly but you can also use one 
of the various high level wrappers around it. Usually it's a versatility / 
maintainability trade off that you have to be aware off but I'll probably be 
easy to convince either way :)

This nicely brings me to my next point: I might be a mentor, but that doesn't 
mean that I have all the answers. Feel free to challlange my point of view if 
you disagree about anything - I'll probably be grateful about it!

As a general note: I'd like to continue the discussion about this GSoC project 
on this public mailing list.
Please don't worry about other student seamingly "stealing" your ideas or 
anything like that - it's really not the final proposal that's in melange that 
matters most but rather the communication and interaction leading up to it.
So if anyone really does steal all your ideas: I'll know and act accordingly 
:)

I hope we can have an open and productive discussion!

Best regards,
Peter