[Digikam-devel] Regarding face recognition in digikam GSoC project
kunal ghosh
kunal.t2 at gmail.com
Fri Apr 2 19:20:12 BST 2010
hi , this is my revised proposal for the Google Summer of Code KDE-digikam
project to implement face recognition in digikam.
I would kindly request you to please provide suggestions / comments to
improve the proposal.
Proposal follows:
*Title*: Implementation of Face Recognition engine in digikam for automatic
face recognition and tagging.
*Motivation:*I have always been a big fan and supporter of intelligent and
elegant technologies and so when i had to join a Special Interest Group at
our institute my obvious choice was the face recognition SIG ,as it
interested me the most.
Having had a brush with Picasa 3.5 and then Digikam as i shifted to KDE,
made me feel the lack of face tagging.
Automatic Face tagging being most demanded feature in digikam mailing lists
(request-id 46288 <http://bugs.kde.org/show_bug.cgi?id=146288>) motivated me
to take it up as my GSoC project.
Also since face recognition is what i am working on as my area of research
at my institute, i have a deep desire in pursuing this project as it would
be most fruitful for the project and me. ( I had discussed this feature with
the digikam-devel list
http://old.nabble.com/face-recognition-in-digikam-td26844374.html ).
*Implementation Details:*
*I propose to follow a library and plugin architecture wherein the front-end
would be menu additions to digikam in the form of a face-recognition plugin
and the back end would have the detection and recognition engine performing
the image processing.
*
*Library implementation:*
*
*
1.The face detection would be done using multiple Haar-Cascades based
Libface <http://libface.sourceforge.net/file/Home.html>'s detector which has
been tried , tested and implemented in openCV with promising results..
2.Since i propose to use Elastic Bunch Graphs for recognizing faces (reason
and comparison with other models follow) i propose to use OpenCV to perform
the image processing needs.
3.Since Most modern CPUs have multiple cores and recent advent of General
Purpose GPU SDKs i propose to implement the library routines in either CUDA
or OpenCL. (*discussions with mentors required*).
CUDA and OpenCL would be more interesting for the library because the
recognition speed would increase overtime as more core CPUs and faster GPUs
appear common desktop.Hence making digikam future proof.
The idea behind dividing the task into a library is to extend the
face-recognition to object recognition in general, in future,
and to make it easy for other projects to adopt the same library in their
code.
*Why Elastic Bunch Graph (EBG) Based Face Recognition:*
1.It adds semantics to face recognition.It recognizes faces taking into
consideration various facial features(eyes, nose etc) and uses matching
criterion like distance between the features.
As compared to the above PCA and LDA based methods like Eigen Faces and
Fisher Faces do not extract facial feature data and thus do not use the
patterns in faces , resulting in lower recognition accuracy.
2.Elastic Bunch Graphs can store the feature data in matrices (may vary
depending on implementation) so adding another training image is just a
matter of adding a matrix entry.Where as Eigen Faces and Fisher Faces based
methods create a set of representative images (eigen faces or fisher faces)
which need to be recalculated every time a new training image is added
leading to a lot of time overhead for large training sets.
This lack of retraining in the case of EBG based method makes it a better
contender for the library.
3.Since EBG based method considers semantic data of face it understands
where there is a pose and expression variation in the test image. This is a
major concern in a photo management application wherein there may be photos
of the same person in a different mood hence a different facial expression.
Further statistical data comparing EBG method to various other methods for
face recognition can be found here http://tinyurl.com/facerec-compare
*The UI and the Digikam/KDE integration:*
The face recognition feature would be a plugin/built-in feature of digikam
which can be enabled/disabled from the
"Configure Digikam Menu" in the "Settings" Main Menu item already in
digikam.
Once enabled the detection would be run through all the digikam albums.This
would be done asynchronously to not hinder usability.
Once the detection is done, the user is presented with a simple widget
containing the detected faces along with UI elements necessary to name the
faces and also to reject non face images detected by the detection system.
This would include code from the current "Edit Menu" where in a box can be
drawn on a portion of the image.
After which the metadata would be stored in RDF or XMP format inside the
image. Also region tagging would be based on the following image annotation
system http://www.kanzaki.com/docs/sw/img-annotator.html .
The detected faces would be stored in a database ( choice depends on project
constraints, mostly SQLite) along with the
bunch graphs for faster data access and updation.
The metadata generated , as in the names of people in a photograph would be
registered with Nepomuk for linking with names in emails etc.
*
*
*Proposed schedule:*
*Now - April 9th *:*Build digikam* and tinker around with the code and
fix few bugs to *increase familiarity*.
*April 10-April 30th* :Finalize on the metadata format i.e RDF / XMP and *get
the basic detection working*.
*May 1st-June 1st* :OpenCL/CUDA based Recognition engine ready.*Documentation
and bundling into library with some test images*.
*June 1st-July 1st* : Finish UI building plugin and Integration with the
recognition engine.*Document code and write a User's Guide.*
*July 1st-August 1st*: Write a small demon to collect performance data from
willing beta testers.Thoroughly test the system and collect anonymous usage
data , fine tune the plugin and the recognition engine. *Documents the tests
and performance statistics.
**August 1st-August 9th*:Last minute modifications *if any*.
*Why should i be chosen:*
*
*
1. I have a good understanding of the Qt framework and have used as well as
given demos of the same in local LUG meetings.
2. I am pursuing research on face recognition in my institute and would be
able to give a lot to this project.
3. I have been an active member of the Bangalore Open Solaris User's Group
(BOSUG) for over a year and am deeply committed to free software. I am
also working on an installer based on Qt for Belenix OS.
4. I have experience using svn and git and am comfortable with working with
tools like cmake etc.
*About me*
*
*
Name: Kunal Ghosh
IRC : gancient
Location: Bangalore , India
I am right now pursuing my Bachelor of Engineering in Computer Science and
Engineering and have been deeply motivated and inspired by the philosophy of
Free Software.I take interest in music and arts of all forms.
Pattern Recognition and Robotics interest me ( Though being opposite fields,
in my opinion they complement each other ).
I also love to use python to accomplish my programming needs whenever
possible (language should be a means of easy and efficient communication and
python is exactly that).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/digikam-devel/attachments/20100402/34d9c524/attachment.html>
More information about the Digikam-devel
mailing list