[Kde-accessibility] KDE Speech API Draft 2 and new KTTSD

Sat May 22 00:00:41 CEST 2004

On Saturday 15 May 2004 8:49 pm, Gary Cramblitt wrote:
> I have posted for comment Draft 2 of the KDE Text-to-speech API at the
> following URL.
>
> http://home.comcast.net/~garycramblitt/oss/apidocs/kttsd/html/classkspeech.
>html
>
> Please note that this is a high-level API for KDE applications to interface
> with KTTSD, the KDE Text-to-speech daemon.  It is not the same as the KTTSD
>   Plugin API that is also currently being discussed on this list, although
> it is related of course.
>
> Some of the links on this page will take you to other pages that represent
> the internal documentation for KTTSD.  Until I figure out how to keep
> Doxygen from generating such links, please try to stay on page
> classkspeech.html in your browser.
>
> Comments welcome.
>

Let's see if I can summarize comments so-far received.  If I've 
mis-characterized your opinion, I apologize ahead of time:

1.  Bill and Roger think we should use the Gnome API.  Gary thinks that 
doesn't fit into the KDE way of doing things.

2.  Everyone agrees only one thing should be spoken on the audio device at one 
time.

3.  Olaf and Gunnar think there should be the following "types" of speech:

      Screen Reader.  Highest priority.  One "job" at time.  New job cancels 
an in-progress Screen Reader job and suspends any other jobs.
      Warnings and Messages.  Mid priority.  Each has a queue. Spoken as soon 
as possible after screen reader jobs.  Suspend text jobs.  Suspended by 
Screen Reader and resume when Screen Reader is not speaking.  Warnings are 
spoken before Messages.
      Text jobs.  Lowest priority.  Spoken after Screen Reader, Warnings, and 
Messages.  A queue.  Suspended by Screen Reader, Messages, and Warnings, and 
resumes after they are spoken.  Text jobs have "parts" and both programmer 
and user (via job manager) may advance or rewind to part n, advance or rewind 
by sentence, cancel, remove, re-order, restart, or pause text jobs.

4.  Olaf and Gary wonder if there needs to be separate Messages and Warnings.

5.  Gary and Gunnar think kttsd should be designed to synthesize the next 
"sentence" of speech while simultanously audibilizing the current "sentence" 
on the audio device.  I.e., avoid unnecessary delays.

6.  Gary thinks kttsd should do sentence parsing to compensate for 1) plugins 
that do not have ability to immediately stop speech, and 2) provide 
navigation for those apps that don't provide there own navigation.  Gary 
thinks support for speech markup will still be possible.

7.  Nobody seems to like paragraph parsing.  We should get rid of that.

8.  We need to extend the api to support multiple voices/volumes/genders per 
language.  Gary thinks that can be done by extending the existing language 
code argument and nobody has "so-far" objected to that.  There will need to 
be some way for apps to discover what voices/volumes/genders/languages are 
available.

9.  Bill thinks prioritization and interruption of speech should be done by 
the apps (client side versus server side).  Gary thinks that approach overly 
burdens the app programmer.

10.  Bill thinks instead of apps controlling "jobs" via stopText, resumeText, 
pauseText, etc., control should be at the "talker" level, i.e., 
stopSynthesizer, resumeSynthesizer, etc.

11.  Everybody agrees kttsd should provide feedback to applications as to the 
current state.  Gnome uses callbacks.  KDE uses dcop signals.

Some other thoughts:

We are assuming there is at-most one Screen Reader active in the system at one 
time.  Valid assumption?

In addition to the reasons I gave above, sentence parsing will avoid having to 
synthesize huge chunks of text at one time, which will also avoid delays.

I think we should get rid of the nextSenText, prevSenText, nextParText, and 
prevParText and go to

  jumpToPart  - for navigating parts (absolute motion)
  moveRelSentence - for navigating sentences (relative motion)

Looking over the summary above, and ignoring the Gnome API issue, I don't 
think the current API (and implementation) is far from achieving the stated 
objectives.  Get rid of paragraph parsing, add some methods for Screen 
Readers, add parts to text jobs, possibly unify Messages and Warnings, and 
we're there. :-)

-- 
Gary Cramblitt (aka PhantomsDad)