[kde-community] The Future of Speech Recognition in KDE: Proposal

Sat Aug 31 16:11:52 BST 2013

On Sat, Aug 31, 2013 at 2:59 AM, Peter Grasch <peter at grasch.net> wrote:
> Hello,
>
> for those of you that do not yet know me, my name is Peter Grasch and I
> currently maintain the Simon project (http://simon.kde.org), a speech
> recognition project in KDE's extragear.
>
> Over the course of the summer, I have been working on bringing dictation
> capabilities to Simon (more info & demo video: http://grasch.net/node/22).
> Now, I'm trying to build up a network of developers and researchers that
> work together on building high accuracy, large vocabulary speech
> recognition systems for a variety of domains (desktop dictation being
> one of them).
>
> Building such systems using free software and free resources requires a
> lot of work in many different areas (software development, signal
> processing, linguistics, etc.).
> In order to facilitate collaboration and to establish a sustainable
> community between volunteers of such diverse backgrounds, I am convinced
> that the right organizational structure is crucial to ensuring continued
> long-term success.
>
> Naturally, as a KDE contributer, I would like to launch this project as
> part of KDE. I talked to quite a number of the people who expressed
> interest in taking up an active role in this effort, and this is what we
> would like to propose:
> * A new category in KDE's extragear called "Speech" (putting it on the
> same level as e.g., "Network"). Rationale: Not all speech recognition
> applications are necessarily related to accessibility (e.g., lecture
> transcription) and splitting up the projects in different categories
> would hinder collaboration.
> * Creating the "open speech group" (name still a work in progress) and
> setting up a project page for it. This would serve as little else than a
> common label for all projects that are part of the initiative -
> basically the equivalent of "KDE Multimedia Team" but for speech instead
> of multimedia. Rationale: A common brand makes it easier to market and
> represent the collective effort of all sub-projects.
>
> I've obviously read the KDE manifesto carefully and I think that such a
> group would be in line with the overall spirit, even though there are
> some details that I feel the need to point out explicitly:
> Some of the sub-projects may not necessarily be about end-user software
> or even software at all (e.g., speech modeling). However, please keep in
> mind that this is a sub-project of a larger initiative that is very much
> about end-user software; splitting the speech modeling in a separate
> project just makes sense because it's an ambitious project in it's own
> right.
> Some of sub projects may appear to diverge from "established practices"
> (by not using C++, for example) but that is mostly because there won't
> be any similar KDE projects (for example, somebody is already working on
> a web-based transcriber system based on ruby on rails) or "special
> considerations" (e.g., an application for Mac OS X may use the native
> toolkit because the KDE infrastructure for OS X is not sufficiently mature).
>
> I'm posting this here on the community list because I want to hear your
> thoughts on the proposal. Do you think that the 'open speech group'
> would fit within KDE?
>
> Best regards,
> Peter

+1

"Accessibility" is an important aspect of the Simon project, but can
also be limiting as you explain.

Example:
Tablets and smartphones are mostly for content consumption rather than
creation. Adding speech recognition to Plasma Active would be nifty.

Carl