[Kde-accessibility] KDE Text-to-speech API 1.0 Draft 1

Thu Apr 8 00:06:16 CEST 2004

I have posted for comment a proposed new KDE Text-to-speech API at the 
following URL.

http://home.comcast.net/~garycramblitt/oss/apidocs/kttsd/html/classkspeech.html

Please note that this is a high-level API for KDE applications to interface 
with KTTSD, the KDE Text-to-speech daemon.  It is not the same as the KTTSD  
Plugin API that is also currently being discussed on this list, although it 
is related of course.

Some of the links on this page will take you to other pages that represent the 
internal documentation for KTTSD.  Until I figure out how to keep Doxygen 
from generating such links, please try to stay on page classkspeech.html in 
your browser.

Why the new API?
------------------------

There is a problem with the existing KTTSD API.  Applications currently have 3 
choices for generating speech from text:

  1.  sayWarning
  2.  sayMessage
  3.  setText

sayWarning and sayMessage are intended for short, one sentence messages.  
KMouth, for example, uses sayMessage.  Users do not have the capability to 
rewind or replay these messages.  setText permits these capabilities, but 
only one application at a time can call setText.  If application A calls 
setText, and before KTTSD has finished speaking, application B calls setText, 
then application A's speech is clobbered and replaced with application B's 
text.  (Think of terms of much larger blocks of text.  For example, I'm 
browsing the web and come across a good article.  I want my computer to read 
the article to me, while I continue browsing elsewhere.)

While it might have been possible to add a method or two that would have 
enabled application B to detect that KTTSD was busy servicing application A, 
I felt this placed an undo burdon on application programmers.  Most apps will 
want to send some text to KTTSD to be spoken and forget it, i.e. set and 
forget.

Instead, the new API provides for a queue of text jobs, very much like a print 
queue.  When the setText job of one application is finished speaking, the 
next job (application B) begins.  Using the KTTSD GUI, the user will be able 
to pause, stop, rewind, skip, re-order and delete speech jobs.

Note that the new API is 100% backwards compatible with the existing KTTSD 
API, and therefore should not break any existing applications that are using 
it.

In addition to solving the problem I mentioned, the new API also offers some 
enhanced capabilities, such as providing signal feedback to applications.  It 
should be possible for apps to use these enhancements for doing more complex 
TTS functionality.

I did take a look at the Gnome Speech API, with the intention of designing a 
compatible KDE API.  However, IMHO, this was not practical because of GSAPI's 
heavy reliance on CORBA, and overly-complex interface.

I have already implemented much of this new API in code.  Unless there are 
major objections, I intend to begin committing the new code to CVS in about 
10 days (next weekend).  (In case you didn't know, KTTSD is currently in the 
kdenonbeta module.)

Please comment to this mailing list or e-mail me directly.  I look forward to 
your input.

-- 
Gary Cramblitt (aka PhantomsDad)