[Kde-accessibility] Proposol (with implementation) for synthesizer interface

Wed Mar 31 21:43:00 CEST 2004

Hello!

> In order to reduce delay from the first sentence to start of sound, should
> plugin load the speech engine on this call?  In most cases you probably would
> want it to, but if you have multiple plugins configured, maybe not.  Perhaps
> a per-plugin configuration option (Load speech engine on kttsd startup?")

This is already implemented in the low level driver: It loads
mbrola/freephone/hadifix on first use.

> As we discussed on IRC, I don't think the plugins should be responsible for
> queue management.  That's the job of kttsd.  At most, a plugin might need to
> keep track of three "sentences" (buffers):

Please remember: There are hardware synthesizers connected to the serial
port. In this case, you don't have a WAV-File.

>   3.  The wave file containing a synthesized sentence and currently being
> spoken on the audio device (arts).
>   2.  A sentence currently being synthesized to a wave file.
>   1.  A sentence waiting to be sent to the synthesizer.

You can't work on sentence basis: When one line (or sentence) is spoken
and the user changes the line, he wants the speech pipeline to be cleared
immediately. The new line should be read as fast as possible. Is this
possible with artsd? I did a lot of work to reduce the amount of audia
data in the buffers of /dev/dsp.

> I don't think this is needed.  The plugin should assume that anything sent to
> it is to be spoken ASAP, but see below.

See above, sometimes you may send even incomplete word to the synthesizer.

> There is a need for applications using kttsd to obtain information about a
> plugin/synthesis engine's capabilities.  For example, I'm assuming some level
> of support for speech markup languages such as Java Speech Markup Language
> (JSML), Sable, or VoiceXML.  The application needs to know if the plugin can
> support markup languages and which ones.  So my point is that synthesis
> parameters are two way.  kttsd needs to be able to set parameters, but it
> also needs to be able to request them from the plugin.  This needs more work.
> I'm thinking some sort of data structure might be appropriate similar to the
> GNOME Speech API.

Should we keep this in mind and add it later or add this now?

> Also, all 3 markup languages I mentioned have a feature called "markers".
> Festival, for instance, when it processes a marker will emit a signal with
> the marker name.  The plugin should pass these signals on to kttsd.  I'm not
> sure what kttsd will do with them right now, probably just pass them on to
> the application, but we should plan for them.

Markers are usefull for some purposes, but difficult to implement for some
software speech synthesizers. They should be optional.

> One other thing.  Since synthesis engines can and do crash, the plugin should
> be prepared for that and attempt to gracefully handle it.  For instance, on
> the first crash, simply restart the engine and try again by resending the
> sentence in state 2.  If that causes a crash, discard the sentence in state 2
> and restart the engine, sending it the sentence in state 1.  If that produces
> a crash, stop altogether until clear() is called, then try again when
> startTalking is called.  (I'm assuming that the user will have noticed the
> interrupted speech and done something to fix the situation.)

My software synth driver is already doing this.

	Roger