[Kde-accessibility] [GSoC] Face Detection and Face Recognition for Simon
Peter Grasch
grasch at simon-listens.org
Tue Mar 20 19:54:10 UTC 2012
Hello everbody,
We've had a quite overwhelming response to the face detection and -recognition
project for Simon that I feel it's time to introduce all of you to each other
and to outline the project a bit better.
As stated on the ideas page, this project is about using computer vision (face
detection and face recognition) to improve Simons recognition accuracy -
mainly to avoid false positives.
During last years GSoC Simon gained a powerful foundation to take context into
account during the recognition. Initially, this system only affects the
scenario selection. If you build a current development version of Simon,
you'll se a "Context" tab that allows you to set up conditions for the
currently active scenario. The scenario will be de-activated if those
conditions are not met (if you don't have any conditions, the scenario will
always be active).
Use case: The Firefox scenario (i.e. "New Tab", "Next Tab", etc.) should only
be active, when Firefox is actually running.
That system is going to be expanded to cover other areas as well. For example,
we are planning to hook it up to the sound system to enable / disable certain
microphones depending on the users position in the room.
Another aspect would be to use the context information to select a different
acoustic model. For example, a different acoustic model should be used
depending on the distance between speaker and microphone (as this affects the
acoustic properties quite a bit).
As you can see, we plan on having more context information than "active
window". And that information is going to be used in a lot of different places.
The face detection and face recognition should be implemented as another
context plugin. That way it's automatically available in all the areas listed
above. For example, we could strategically place webcams in a room to track
the users location - and activate / deactivate microphones acording to that
information. Or we could switch the acoustic model depending on who is
standing in front of the microphone. All of that would be possible without any
further work if the system is designed correctly (modular).
So what is your job over the summer?
Implement a face detection / recognition context plugin. The plugin
infrastructure is laid out in such a way that you can present your own
configuration screen (alowing you to take webcam pictures to get a frame of
reference for example). However, plugins don't have a UI during "normal"
operation. The only interface you have to the rest of Simon is a simple
boolean return value to tell the context system if the condition is matched
(i.e.: is "Peter" sitting in front of the system? [Y/N]).
Also, the context system is not finished but it's being worked on. The scenario
switching does work already and can be used as a testing ground (the plugin is
supposed to only a boolean match - for that it doesn't matter where it is
used) Expect to see some rough edges, however, and be prepared to help out a
little there as well if absolutely necessary.
As some people have suggested to use face recognition to do the login between
Simon and Simond: I don't think that's such a good idea. The login between
those two is almost exclusively a way of identifying the user (when the server
serves multiple clients). So during normal operation, it doesn't change on the
client at all. In multi user scenarios, users tend to have separate user
accounts - which means each user account can store their own Simond password
anyway.
The face recognition to switch acoustic models is mainly a way to improve
Simons accuracy in e.g. a Kiosk environment where all users basically use the
same stuff. It might still be helpful to use different models depending on
gender, etc. While implementing it as a login mechanism would still enable
that, it would get in the way of the use cases mentioned above and complicate
the implementation unnecessarily (as the login is an entirely separete
system).
About the used libraries: You can use opencv directly but you can also use one
of the various high level wrappers around it. Usually it's a versatility /
maintainability trade off that you have to be aware off but I'll probably be
easy to convince either way :)
This nicely brings me to my next point: I might be a mentor, but that doesn't
mean that I have all the answers. Feel free to challlange my point of view if
you disagree about anything - I'll probably be grateful about it!
As a general note: I'd like to continue the discussion about this GSoC project
on this public mailing list.
Please don't worry about other student seamingly "stealing" your ideas or
anything like that - it's really not the final proposal that's in melange that
matters most but rather the communication and interaction leading up to it.
So if anyone really does steal all your ideas: I'll know and act accordingly
:)
I hope we can have an open and productive discussion!
Best regards,
Peter
More information about the kde-accessibility
mailing list