[kde-community] The Future of Speech Recognition in KDE: Proposal

Sat Aug 31 10:59:17 BST 2013

Hello,

for those of you that do not yet know me, my name is Peter Grasch and I
currently maintain the Simon project (http://simon.kde.org), a speech
recognition project in KDE's extragear.

Over the course of the summer, I have been working on bringing dictation
capabilities to Simon (more info & demo video: http://grasch.net/node/22).
Now, I'm trying to build up a network of developers and researchers that
work together on building high accuracy, large vocabulary speech
recognition systems for a variety of domains (desktop dictation being
one of them).

Building such systems using free software and free resources requires a
lot of work in many different areas (software development, signal
processing, linguistics, etc.).
In order to facilitate collaboration and to establish a sustainable
community between volunteers of such diverse backgrounds, I am convinced
that the right organizational structure is crucial to ensuring continued
long-term success.

Naturally, as a KDE contributer, I would like to launch this project as
part of KDE. I talked to quite a number of the people who expressed
interest in taking up an active role in this effort, and this is what we
would like to propose:
* A new category in KDE's extragear called "Speech" (putting it on the
same level as e.g., "Network"). Rationale: Not all speech recognition
applications are necessarily related to accessibility (e.g., lecture
transcription) and splitting up the projects in different categories
would hinder collaboration.
* Creating the "open speech group" (name still a work in progress) and
setting up a project page for it. This would serve as little else than a
common label for all projects that are part of the initiative -
basically the equivalent of "KDE Multimedia Team" but for speech instead
of multimedia. Rationale: A common brand makes it easier to market and
represent the collective effort of all sub-projects.

I've obviously read the KDE manifesto carefully and I think that such a
group would be in line with the overall spirit, even though there are
some details that I feel the need to point out explicitly:
Some of the sub-projects may not necessarily be about end-user software
or even software at all (e.g., speech modeling). However, please keep in
mind that this is a sub-project of a larger initiative that is very much
about end-user software; splitting the speech modeling in a separate
project just makes sense because it's an ambitious project in it's own
right.
Some of sub projects may appear to diverge from "established practices"
(by not using C++, for example) but that is mostly because there won't
be any similar KDE projects (for example, somebody is already working on
a web-based transcriber system based on ruby on rails) or "special
considerations" (e.g., an application for Mac OS X may use the native
toolkit because the KDE infrastructure for OS X is not sufficiently mature).

I'm posting this here on the community list because I want to hear your
thoughts on the proposal. Do you think that the 'open speech group'
would fit within KDE?

Best regards,
Peter