[Kde-accessibility] [Fwd: Re: Fwd: Re: paraphlegic KDE support]
Bart Alberti
bart at solozone.com
Fri Feb 24 01:37:18 CET 2006
I had meant to send this to the whole list and not to engage in a
personal discussion with the esteemed Willie Walker. I hit 'reply'
thinking this went to the list and I had intended to reply to the next
posting on the list, actually, where the phrase 'allergy' occurs.
I see Gary has an ''allergy, too" :-)
Bart Alberti
-------- Original Message --------
Subject: Re: [Kde-accessibility] Fwd: Re: paraphlegic KDE support
Date: Thu, 23 Feb 2006 12:53:43 -0800
From: Bart Alberti <bart at solozone.com>
To: Willie Walker <William.Walker at Sun.COM>
References: <200602231020.01822.garycramblitt at comcast.net>
<1140709321.15975.3.camel at linux.site>
<6072A454-C87C-4612-AB8E-648FB3CA746B at sun.com>
<200602231245.48567.garycramblitt at comcast.net>
<3DB7D248-CC17-4F5B-B194-66ECE8D53BFE at sun.com>
Willie Walker wrote:
>Hi Gary:
>
>Thanks for the kind words. I'm confused about what you mean by "a
>strategy that tries to integrate Sphinx with AT-SPI." My
>recommendation would be to write an assistive technology (GVOK, the
>GNOME Voice-Only Keyboard, though a compelling speech interface to
>the desktop is far more than just doing speech buttons) that uses
>speech recognition and the AT-SPI. Thus, yes, they are integrated,
>but at the assistive technology level.
>
>In other words, this mysterious GVOKian thing would interface
>directly with a speech recognition engine and drive/interact with
>applications via the AT-SPI. This should all be possible without
>requiring any new API or additional infrastructure for the platform.
>Heck, look at http://xvoice.sourceforge.net/. One can even
>potentially use a Windows box to do the recognition and communicate
>with something to drive the GNOME desktop. It's all been done before
>in more primitive ways.
>
>Having said that, our engine choices on the Linux desktop are rather
>slim. Sphinx-3{.3} can get you some places, but it's only going to
>have dictation-style grammars and not the annotated BNF-style
>grammars that are typically used for command and control. Sphinx-4
>will get you both n-Gram and CFG grammars, but it is in Java, which
>seems to cause a curious allergic reaction around these parts. In
>addition, their performance/accuracy need work to make them truly
>viable interactive desktop engines. Other options have licensing
>hairballs.
>
>One might try to put a business model before IBM (ViaVoice) and
>Nuance (Dragon) to see if they'd make their engines available on
>Linux (again, in the case of IBM).
>
>Will
>
>PS - The use of GVOK is just a pun on GOK and doesn't imply the thing
>would act or behave like GOK or would even be a speech-enabled GOK.
>
>On Feb 23, 2006, at 12:45 PM, Gary Cramblitt wrote:
>
>
>
>>On Thursday 23 February 2006 11:57, Willie Walker wrote:
>>
>>
>>>Hi All:
>>>
>>>I just want to jump in on the speech recognition stuff. Having
>>>participated in several standards efforts (e.g., JSPAI, VoiceXML/
>>>SSML/
>>>SGML) in this area, and having developed a number of speech
>>>recognition applications, and having seen the trials and tribulations
>>>of inconsistent SAPI implementations, and having led the Sphinx-4
>>>effort, I'd like to offer my unsolicited opinion :-).
>>>
>>>In my opinion, there are enough differences in the various speech
>>>recognition systems and their APIs that I'm not sure efforts are best
>>>spent charging at the "one API for all" windmill. IMO, one could
>>>spend years trying to come up with yet another standard but not very
>>>useful API in this space. All we'd have in the end would be yet
>>>another standard but not very useful API with perhaps one buggy
>>>implementation on one speech engine. Plus, it would just be
>>>repeating work and making the same mistakes that have already been
>>>done time and time again.
>>>
>>>As an alternative, I'd offer the approach of centering an available
>>>recognition engine and designing the assistive technology first. Get
>>>your feet wet with that and use it as a vehicle to better understand
>>>the problems you will face with any speech recognition task for the
>>>desktop. Examples include:
>>>
>>>o how to dynamically build a grammar based upon stuff you can get
>>>from the AT-SPI
>>>o how to deal with confusable words (or discover that recognition for
>>>a particular grammar is just plain failing and you need to tweak it
>>>dynamically)
>>>o how to deal with unspeakable words
>>>o how to deal with deictic references
>>>o how to deal with compound utterances
>>>o how to handle dictation vs. command and control
>>>o how to deal with tapering/restructuring of prompts based upon
>>>recognition success/failure
>>>o how to allow the user to recover from misrecognitions
>>>o how to handle custom profiles per user
>>>o (MOST IMPORTANTLY) just what is a compelling speech interaction
>>>experience for the desktop?
>>>
>>>Once you have a better understanding of the real problems and have
>>>developed a working assistive technology, then take a look at perhaps
>>>genericizing a useful layer to multiple engines. The end result is
>>>that you will probably end up with a useful assistive technology
>>>sooner. In addition, you will also end up with an API that is known
>>>to work for at least one assistive technology.
>>>
>>>Will
>>>
>>>
>>Thanks for the great post Will. So would you advise against a
>>strategy that
>>tries to integrate Sphinx with AT-SPI?
>>
>>BTW, I noticed that in latest Windows Vista beta review (ZDnet), it
>>has both
>>TTS and SST capabilities. Looks like F/OSS will have some catching
>>up to do.
>>
>>--
>>Gary Cramblitt (aka PhantomsDad)
>>KDE Text-to-Speech Maintainer
>>http://accessibility.kde.org/developer/kttsd/index.php
>>
>>
>
I've been dealing with Sphinx as part of the 'festival' speech synthesis
system and I find it difficult. I do not find Java to be a plus; that is
due to my lack of skills or enthusiasm but others I know with better
credentials say the same I do believe. I would be sorry to see 'Vista'
getting ahead.
Bart Alberti
More information about the kde-accessibility
mailing list