[Kde-accessibility] Fwd: Re: paraphlegic KDE support

Thu Feb 23 18:45:48 CET 2006

On Thursday 23 February 2006 11:57, Willie Walker wrote:
> Hi All:
>
> I just want to jump in on the speech recognition stuff.  Having
> participated in several standards efforts (e.g., JSPAI, VoiceXML/SSML/
> SGML) in this area, and having developed a number of speech
> recognition applications, and having seen the trials and tribulations
> of inconsistent SAPI implementations, and having led the Sphinx-4
> effort, I'd like to offer my unsolicited opinion :-).
>
> In my opinion, there are enough differences in the various speech
> recognition systems and their APIs that I'm not sure efforts are best
> spent charging at the "one API for all" windmill.  IMO, one could
> spend years trying to come up with yet another standard but not very
> useful API in this space.  All we'd have in the end would be yet
> another standard but not very useful API with perhaps one buggy
> implementation on one speech engine.  Plus, it would just be
> repeating work and making the same mistakes that have already been
> done time and time again.
>
> As an alternative, I'd offer the approach of centering an available
> recognition engine and designing the assistive technology first.  Get
> your feet wet with that and use it as a vehicle to better understand
> the problems you will face with any speech recognition task for the
> desktop.  Examples include:
>
> o how to dynamically build a grammar based upon stuff you can get
> from the AT-SPI
> o how to deal with confusable words (or discover that recognition for
> a particular grammar is just plain failing and you need to tweak it
> dynamically)
> o how to deal with unspeakable words
> o how to deal with deictic references
> o how to deal with compound utterances
> o how to handle dictation vs. command and control
> o how to deal with tapering/restructuring of prompts based upon
> recognition success/failure
> o how to allow the user to recover from misrecognitions
> o how to handle custom profiles per user
> o (MOST IMPORTANTLY) just what is a compelling speech interaction
> experience for the desktop?
>
> Once you have a better understanding of the real problems and have
> developed a working assistive technology, then take a look at perhaps
> genericizing a useful layer to multiple engines.  The end result is
> that you will probably end up with a useful assistive technology
> sooner.  In addition, you will also end up with an API that is known
> to work for at least one assistive technology.
>
> Will

Thanks for the great post Will.  So would you advise against a strategy that 
tries to integrate Sphinx with AT-SPI?

BTW, I noticed that in latest Windows Vista beta review (ZDnet), it has both 
TTS and SST capabilities.  Looks like F/OSS will have some catching up to do.

-- 
Gary Cramblitt (aka PhantomsDad)
KDE Text-to-Speech Maintainer
http://accessibility.kde.org/developer/kttsd/index.php