[Kde-accessibility] Proklam and KMouth

Bill Haneman bill.haneman@sun.com
23 Sep 2002 20:32:49 +0100


> > I believe that if you use the festival internal APIs instead of just the
> > festival-server API, you can get better control of utterances.  It is
> Are you using the festival-server API in gnome-speech ?

At the moment, yes.  We may decide to move to using internal APIs if
needed for more flexibility, or we may just limit our really "advanced"
features to the FreeTTS driver for our first release.

> > certainly possible to stop a phrase that is partway spoken, by issuing
> > this string to the festival server:
> >
> > "(audio_mode 'shutup)\n"
> AFAIK, the say function was monolitic, so, it should have to be issue by
> another thread but I don't think Festival is multithread.

It think that may be true that Festival is single-threaded, but a call
to "SayText" will return long before the string has been spoken, it
returns as soon as the string has been received.  You can call "shutup"
any time before the string has finished sending the waveform to the
audio device, and festival will stop in the middle of speaking the
string.  The results are probably good enough for most purposes other
than synchronization (for instance a talking head or captions), and also
you don't get notified how far through the string the synthesis engine
had gotten when you stopped it, so you can't tell it to "resume"
speaking at exactly the same point.

To do this (pause/resume) you'd either need progress notifications,
which festival might not be able to give without using its internal api,
or else you'd need to break the input to festival down into individual
words.  Doing the second thing would give you per-word notification
capability but you'd be throwing away festival's powerful lexical
analysis features.  It may be possible to use some lisp API to get the
individual "diphones" from festival after parsing and before
concatenation.  If you did that, you could probably find a way to get
per-diphone notification by sending the diphones out one-at-a-time or
perhaps one word at a time, but we haven't done that yet.

best regards,

We have the same issues, that we need API to handle all capabilities but
cannot assume that any TTS engine has a given capability.  In cases like
this, one could return boolean values from methods like "stopSpeaking",
to indicate whether the command worked or not.   However we think at the
moment that all TTS engines that are useful for accessibility will need
this feature; also, in our IDL, if a method is not implemented, a
NotImplementedException is raised.  So if there were a gnome-speech TTS
engine that could not "stopSpeaking", a client would get a
NotImplemented exception when calling that method, to inform it that it
was not available.

Bill

> Anyway as you see in other mails I'll post, I'm trying to supouse as little as
> posible for the capabilities of the TTS, because maybe festival can be
> stoped, but other TTS may not be stoped... I need a common ground to be able
> to make every plug in look exactly the same to Proklam which is important to
> speed up plug in development as well as well as Proklam... I'll come back to
> this issue in another mail.
> - --
> Pupeno: pupeno@pupeno.com
> http://www.pupeno.com
> - ---
> Help the hungry children of Argentina,
> please go to (and make it your homepage):
> http://www.porloschicos.com/servlet/PorLosChicos?comando=donar
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.7 (GNU/Linux)
> 
> iD8DBQE9j2X4Lr8z5XzmSDQRAkhrAKDFpceS/F09sKR2LG6qif8JcvVqVwCgrHFg
> SRhOHL5dfKVNPIV7Bm9tWQg=
> =1Rj/
> -----END PGP SIGNATURE-----
>