[kde-community] The Future of Speech Recognition in KDE: Proposal

Sat Aug 31 18:58:07 BST 2013

On Saturday 31 August 2013 11:59:17 AM Peter Grasch wrote:
> Hello,
> 
> for those of you that do not yet know me, my name is Peter Grasch and I
> currently maintain the Simon project (http://simon.kde.org), a speech
> recognition project in KDE's extragear.
> 
> Over the course of the summer, I have been working on bringing dictation
> capabilities to Simon (more info & demo video: http://grasch.net/node/22).
> Now, I'm trying to build up a network of developers and researchers that
> work together on building high accuracy, large vocabulary speech
> recognition systems for a variety of domains (desktop dictation being
> one of them).
> 
> Building such systems using free software and free resources requires a
> lot of work in many different areas (software development, signal
> processing, linguistics, etc.).
> In order to facilitate collaboration and to establish a sustainable
> community between volunteers of such diverse backgrounds, I am convinced
> that the right organizational structure is crucial to ensuring continued
> long-term success.
> 
> Naturally, as a KDE contributer, I would like to launch this project as
> part of KDE. I talked to quite a number of the people who expressed
> interest in taking up an active role in this effort, and this is what we
> would like to propose:
> * A new category in KDE's extragear called "Speech" (putting it on the
> same level as e.g., "Network"). Rationale: Not all speech recognition
> applications are necessarily related to accessibility (e.g., lecture
> transcription) and splitting up the projects in different categories
> would hinder collaboration.
> * Creating the "open speech group" (name still a work in progress) and
> setting up a project page for it. This would serve as little else than a
> common label for all projects that are part of the initiative -
> basically the equivalent of "KDE Multimedia Team" but for speech instead
> of multimedia. Rationale: A common brand makes it easier to market and
> represent the collective effort of all sub-projects.
> 
> I've obviously read the KDE manifesto carefully and I think that such a
> group would be in line with the overall spirit, even though there are
> some details that I feel the need to point out explicitly:
> Some of the sub-projects may not necessarily be about end-user software
> or even software at all (e.g., speech modeling). However, please keep in
> mind that this is a sub-project of a larger initiative that is very much
> about end-user software; splitting the speech modeling in a separate
> project just makes sense because it's an ambitious project in it's own
> right.
> Some of sub projects may appear to diverge from "established practices"
> (by not using C++, for example) but that is mostly because there won't
> be any similar KDE projects (for example, somebody is already working on
> a web-based transcriber system based on ruby on rails) or "special
> considerations" (e.g., an application for Mac OS X may use the native
> toolkit because the KDE infrastructure for OS X is not sufficiently mature).
> 
> I'm posting this here on the community list because I want to hear your
> thoughts on the proposal. Do you think that the 'open speech group'
> would fit within KDE?

Making a meta-project in KDE that combines the efforts of now separate projects sounds like a good plan. If you manage to get these projects together towards a set of well-defined objectives that could really move speech interaction with the computer forward.

Managing such a community might prove hard, but surely the positive KDE environment would help to make contributers feel at home.

Cheers,
Jos