[Kde-accessibility] Proposol (with implementation) for
synthesizer interface
Gary Cramblitt
garycramblitt at comcast.net
Wed Mar 31 02:42:50 CEST 2004
On Tuesday 30 March 2004 1:20 pm, Olaf Jan Schmidt wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Our extended API could maybe look like this:
>
> /**
> * This is the template class for all the (processing) plug ins
> * like Festival, FreeTTS or Command
> */
> class PlugInProc : public QObject{
> Q_OBJECT
>
> public:
> /**
> * Constructor
> */
> PlugInProc( QObject *parent = 0, const char *name = 0);
>
> /**
> * Destructor
> */
> virtual ~PlugInProc();
>
> /**
> * Initializate the speech
> * @param lang The standard language code (e.g. "de" or "en_US")
> * @param config Theobject for storing configuration settings
> */
> virtual bool init(const QString &lang, KConfig *config);
In order to reduce delay from the first sentence to start of sound, should
plugin load the speech engine on this call? In most cases you probably would
want it to, but if you have multiple plugins configured, maybe not. Perhaps
a per-plugin configuration option (Load speech engine on kttsd startup?")
>
> /**
> * Add a text to the queue of texts that are to be spoken
> * @param text The text to be spoken
> */
> virtual void sayText(const QString &text);
As we discussed on IRC, I don't think the plugins should be responsible for
queue management. That's the job of kttsd. At most, a plugin might need to
keep track of three "sentences" (buffers):
3. The wave file containing a synthesized sentence and currently being
spoken on the audio device (arts).
2. A sentence currently being synthesized to a wave file.
1. A sentence waiting to be sent to the synthesizer.
When a sentence moves out of state 1, the plugin should request another
sentence from kttsd by raising Ready() signal.
>
> /**
> * Start the actual talking
> */
> virtual void startTalking();
I don't think this is needed. The plugin should assume that anything sent to
it is to be spoken ASAP, but see below.
>
> /**
> * Stop synthesizing as fast as possible,
> * delete any buffered speech.
> */
> virtual void stopTalking();
This stops the audio output. The question is what to do with the 3 sentences
in states 1 thru 3. stopTalking will normally be called because the user
clicked Pause or Stop in the GUI. If they click Resume, it would be a shame
to throw away the work that has been done. On the other hand, the user may
wish to back up a sentence or restart from the beginning. So here's what I
would suggest:
/**
* Stop audio output as fast as possible.
* Continue speech synthesis, but do not send to audio device.
* Do not request another sentence from kttsd until startTalking is called.
*/
virtual void stopTalking();
/**
* Stop audio output as fast as possible (if in progress).
* Stop synthesis (if in progress)
* Clear all buffers.
*/
virtual void clear();
/**
* Resume (or start) processing. If buffers contain sentences, speak them,
* otherwise emit Ready() signal to request next sentence.
*/
virtual void startTalking();
/**
* sayText is only called in
* response to a Ready signal from the plugin. It is the next sentence
* to be spoken.
*/
virtual void sayText(const QString &text);
So the normal sequence of events would be:
kttsd calls startTalking
Ready() signal is received from plugin.
kttsd calls sayText
plugin begins synthesizing first sentence and emits Ready() signal.
kttsd calls sayText
plugin sends synthesized wave file to audio device, starts synth of next
sentence and emits Ready() signal
kttsd calls sayText
etc.
User clicks Pause
kttsd calls stopTalking()
User clicks Resume
kttsd calls startTalking
plugin continues audio output and synthesis of next sentence (if not already
completed)
when state 1 buffer is empty, plugin emits Ready() signal
kttsd calls sayText
etc.
User clicks Pause
kttsd calls stopTalking()
User clicks Restart, PreviousSentence, or resets the text queues.
kttsd calls clear()
kttsd calls startTalking()
plugin emits Ready() signal
kttsd calls sayText
etc.
>
> /**
> * Set a speech synthesis parameter
> * @param name The name of the parameter
> * @param value The value of the parameter
> */
> virtual void sayText(const QString &name, const QString &value);
I think you meant
virtual void setParameter(const QString &name, const QString &value);
There is a need for applications using kttsd to obtain information about a
plugin/synthesis engine's capabilities. For example, I'm assuming some level
of support for speech markup languages such as Java Speech Markup Language
(JSML), Sable, or VoiceXML. The application needs to know if the plugin can
support markup languages and which ones. So my point is that synthesis
parameters are two way. kttsd needs to be able to set parameters, but it
also needs to be able to request them from the plugin. This needs more work.
I'm thinking some sort of data structure might be appropriate similar to the
GNOME Speech API.
Instead of a Ready() signal, it might be simpler if the plugin called a
getNextSentence method in kttsd to retrieve the next sentence. If we assume
that plugins run in their own thread, that might avoid blocking issues.
In addition to the Ready() signal I already mentioned, the plugin should also
emit signals in response to feedback from the audio device and from the
synthesis engine. For example, an error signal if either one gives the
plugin an error.
Also, all 3 markup languages I mentioned have a feature called "markers".
Festival, for instance, when it processes a marker will emit a signal with
the marker name. The plugin should pass these signals on to kttsd. I'm not
sure what kttsd will do with them right now, probably just pass them on to
the application, but we should plan for them.
One other thing. Since synthesis engines can and do crash, the plugin should
be prepared for that and attempt to gracefully handle it. For instance, on
the first crash, simply restart the engine and try again by resending the
sentence in state 2. If that causes a crash, discard the sentence in state 2
and restart the engine, sending it the sentence in state 1. If that produces
a crash, stop altogether until clear() is called, then try again when
startTalking is called. (I'm assuming that the user will have noticed the
interrupted speech and done something to fix the situation.)
Hope I've made at least some sense.
--
Gary Cramblitt (aka PhantomsDad)
More information about the kde-accessibility
mailing list