[Kde-accessibility] KDE Text-to-speech API 1.0 Draft 1
Gary Cramblitt
garycramblitt at comcast.net
Thu Apr 8 00:06:16 CEST 2004
I have posted for comment a proposed new KDE Text-to-speech API at the
following URL.
http://home.comcast.net/~garycramblitt/oss/apidocs/kttsd/html/classkspeech.html
Please note that this is a high-level API for KDE applications to interface
with KTTSD, the KDE Text-to-speech daemon. It is not the same as the KTTSD
Plugin API that is also currently being discussed on this list, although it
is related of course.
Some of the links on this page will take you to other pages that represent the
internal documentation for KTTSD. Until I figure out how to keep Doxygen
from generating such links, please try to stay on page classkspeech.html in
your browser.
Why the new API?
------------------------
There is a problem with the existing KTTSD API. Applications currently have 3
choices for generating speech from text:
1. sayWarning
2. sayMessage
3. setText
sayWarning and sayMessage are intended for short, one sentence messages.
KMouth, for example, uses sayMessage. Users do not have the capability to
rewind or replay these messages. setText permits these capabilities, but
only one application at a time can call setText. If application A calls
setText, and before KTTSD has finished speaking, application B calls setText,
then application A's speech is clobbered and replaced with application B's
text. (Think of terms of much larger blocks of text. For example, I'm
browsing the web and come across a good article. I want my computer to read
the article to me, while I continue browsing elsewhere.)
While it might have been possible to add a method or two that would have
enabled application B to detect that KTTSD was busy servicing application A,
I felt this placed an undo burdon on application programmers. Most apps will
want to send some text to KTTSD to be spoken and forget it, i.e. set and
forget.
Instead, the new API provides for a queue of text jobs, very much like a print
queue. When the setText job of one application is finished speaking, the
next job (application B) begins. Using the KTTSD GUI, the user will be able
to pause, stop, rewind, skip, re-order and delete speech jobs.
Note that the new API is 100% backwards compatible with the existing KTTSD
API, and therefore should not break any existing applications that are using
it.
In addition to solving the problem I mentioned, the new API also offers some
enhanced capabilities, such as providing signal feedback to applications. It
should be possible for apps to use these enhancements for doing more complex
TTS functionality.
I did take a look at the Gnome Speech API, with the intention of designing a
compatible KDE API. However, IMHO, this was not practical because of GSAPI's
heavy reliance on CORBA, and overly-complex interface.
I have already implemented much of this new API in code. Unless there are
major objections, I intend to begin committing the new code to CVS in about
10 days (next weekend). (In case you didn't know, KTTSD is currently in the
kdenonbeta module.)
Please comment to this mailing list or e-mail me directly. I look forward to
your input.
--
Gary Cramblitt (aka PhantomsDad)
More information about the kde-accessibility
mailing list