[Kde-accessibility] KDE Speech API Draft 2 and new KTTSD
Gary Cramblitt
garycramblitt at comcast.net
Sat May 22 00:00:41 CEST 2004
On Saturday 15 May 2004 8:49 pm, Gary Cramblitt wrote:
> I have posted for comment Draft 2 of the KDE Text-to-speech API at the
> following URL.
>
> http://home.comcast.net/~garycramblitt/oss/apidocs/kttsd/html/classkspeech.
>html
>
> Please note that this is a high-level API for KDE applications to interface
> with KTTSD, the KDE Text-to-speech daemon. It is not the same as the KTTSD
> Plugin API that is also currently being discussed on this list, although
> it is related of course.
>
> Some of the links on this page will take you to other pages that represent
> the internal documentation for KTTSD. Until I figure out how to keep
> Doxygen from generating such links, please try to stay on page
> classkspeech.html in your browser.
>
> Comments welcome.
>
Let's see if I can summarize comments so-far received. If I've
mis-characterized your opinion, I apologize ahead of time:
1. Bill and Roger think we should use the Gnome API. Gary thinks that
doesn't fit into the KDE way of doing things.
2. Everyone agrees only one thing should be spoken on the audio device at one
time.
3. Olaf and Gunnar think there should be the following "types" of speech:
Screen Reader. Highest priority. One "job" at time. New job cancels
an in-progress Screen Reader job and suspends any other jobs.
Warnings and Messages. Mid priority. Each has a queue. Spoken as soon
as possible after screen reader jobs. Suspend text jobs. Suspended by
Screen Reader and resume when Screen Reader is not speaking. Warnings are
spoken before Messages.
Text jobs. Lowest priority. Spoken after Screen Reader, Warnings, and
Messages. A queue. Suspended by Screen Reader, Messages, and Warnings, and
resumes after they are spoken. Text jobs have "parts" and both programmer
and user (via job manager) may advance or rewind to part n, advance or rewind
by sentence, cancel, remove, re-order, restart, or pause text jobs.
4. Olaf and Gary wonder if there needs to be separate Messages and Warnings.
5. Gary and Gunnar think kttsd should be designed to synthesize the next
"sentence" of speech while simultanously audibilizing the current "sentence"
on the audio device. I.e., avoid unnecessary delays.
6. Gary thinks kttsd should do sentence parsing to compensate for 1) plugins
that do not have ability to immediately stop speech, and 2) provide
navigation for those apps that don't provide there own navigation. Gary
thinks support for speech markup will still be possible.
7. Nobody seems to like paragraph parsing. We should get rid of that.
8. We need to extend the api to support multiple voices/volumes/genders per
language. Gary thinks that can be done by extending the existing language
code argument and nobody has "so-far" objected to that. There will need to
be some way for apps to discover what voices/volumes/genders/languages are
available.
9. Bill thinks prioritization and interruption of speech should be done by
the apps (client side versus server side). Gary thinks that approach overly
burdens the app programmer.
10. Bill thinks instead of apps controlling "jobs" via stopText, resumeText,
pauseText, etc., control should be at the "talker" level, i.e.,
stopSynthesizer, resumeSynthesizer, etc.
11. Everybody agrees kttsd should provide feedback to applications as to the
current state. Gnome uses callbacks. KDE uses dcop signals.
Some other thoughts:
We are assuming there is at-most one Screen Reader active in the system at one
time. Valid assumption?
In addition to the reasons I gave above, sentence parsing will avoid having to
synthesize huge chunks of text at one time, which will also avoid delays.
I think we should get rid of the nextSenText, prevSenText, nextParText, and
prevParText and go to
jumpToPart - for navigating parts (absolute motion)
moveRelSentence - for navigating sentences (relative motion)
Looking over the summary above, and ignoring the Gnome API issue, I don't
think the current API (and implementation) is far from achieving the stated
objectives. Get rid of paragraph parsing, add some methods for Screen
Readers, add parts to text jobs, possibly unify Messages and Warnings, and
we're there. :-)
--
Gary Cramblitt (aka PhantomsDad)
More information about the kde-accessibility
mailing list