[Kde-accessibility] KDE Speech API Draft 2 and new KTTSD

Fri May 21 14:04:45 CEST 2004

Hello,

On Friday 21 May 2004 05:47, Gary Cramblitt wrote:
> Thank you very much for the feedback!
>
> On Thursday 20 May 2004 12:16 pm, Olaf Jan Schmidt wrote:
> > There are mainly three types of usage for speech synthesis:
> >
> > 1. Speaking and navigationg through whole texts (can be interrupted by
> > messages and screen reader speech)
> > 2. Speaking single messages (can be interrupted by screen reader
> > speech)
>
> Why would you want to interrupt a single message with screen reader?
>
> > 3. A screen-reader reading out whatever happens on the screen (can be
> > cut off by new screen reader speech)
>
> If "cut off", is the speaking text re-queued or is it canceled?
>

I think the three kinds of usage that Olaf mentions can be characterized as 
follows:

1. whole texts such as a book, a web page, the handbook of an application 
(in khelpcenter), etc.

These do usually require methods for navigation, i.e., the user wants to 
repeat or skip a sentence or paragraph, or he wants to directly jump to 
chapter 8 of a book because he already has heard chapters 1 to 7 
yesterday. Having multiple long texts read simultaneously does not really 
make sense.

2. short messages and warnings such as KMail telling the user that new mail 
arrived, the laptop daemon telling the user that the battery runs out of 
power, KMouth doing speech synthesis for the user, etc.

These are usually one-sentence messages. Depending on the importance of 
these messages you may want to use two different priorities (i.e., 
warnings and messages). Of course these messages and warnings need to 
interrupt the long texts.

3. screen-reader output, such as reading a menu bar item that just got 
focus, telling the user that a dialog button was activated etc.

This output is usually only a few words long and has to be spoken as soon 
as possible or not at all. In this sense the screen reader output 
interrupts any other speech (the other speech will be continued 
immediately after the screen-reader output has finished).

If one screen-reader output is requested while an earlier screen-reader 
output is spoken the earlier output is immediately aborted (and will not 
be continued).

You might see long texts as low-priority speech synthesis, messages and 
warnings as mid-priority speech synthesis and screen-reader output as 
high-priority speech synthesis.

> > The suggested API is very feature-rich for the first two uses, but the
> > third use is not covered.
>
> What is not covered?  I admit I have zero experience with screen
> readers, but for the sake of discussion, let's imagine the following: 
> The screen reader is reading a page of text (let's say a web page.) 
> Focus moves to a button, so the screen reader wants to pause speaking
> and speak the label (or name) of the button.  [...] All of this can be
> accomplished using the current API as follows: We assume the screen
> reader has two "levels" of talking -- a background job and a foreground
> job that represents the control on the screen that has current focus. 
> The background job can be pre-empted by the foreground job, but resumes
> when foreground is no longer speaking.  Foreground jobs can be
> pre-empted by another foreground job, but in this case the pre-empted
> job is canceled.  (Probably over-simplified, but for sake of
> discussion...) So we keep track of two job numbers -- backJobNum and
> foreJobNum.  Here's what the screen reader's code might look like:
> [...]
> Other than combining setText and startText, I don't see how this can be
> much simpler!
>
Your code looks good as long as the screen-reader is the only process that 
does speech synthesis. However, this often is not the case.

A screen-reader usually does only read those things that you do with the 
foreground text job. The long web page is usually read by an other 
application, so we need to cope with this in kttsd. What about:

Code of the Konqueror-plugin:
uint backJobNum
// Queue and start the page of text using default language.
backJobNum = setText(<text of the page>, Null)
startText(backJobNum)

Code of the screen reader:
// <In response to button gotfocus signal>
synthesizeScreenReaderOutput(<button name or label>, Null)

// In response to titlebar gotfocus signal>
synthesizeScreenReaderOutput(<titlebar contents>, Null)

// <In response to image gotfocus signal>.
synthesizeScreenReaderOutput(<image "alt" contents>, Null)

The code of pausing and continuing the speech synthesis of the web page and 
the code of aborting the second screen-reader output is to be written 
inside kttsd.

> [...]
> > Kttsd would then be sent a list of single sentences, and allow to jump
> > to textpart number n ("markers"):
> > [...]
> > virtual void kspeech::jumpTo (uint job, uint id);
>
> Now we have additional complexity.  We have "jobs" and "id"s within a
> job. Unless we assume that only one text job can be active at one time,
> in which case the jobNum argument might not be needed.  But as I explain
> more below, I think it is a bad idea to allow only one text job at a
> time.
>
> I'm not convinced about the need for applications to advance or rewind
> large numbers of sentences/paragraphs at a time.  Most of the time, user
> will want to repeat last one or two sentences (call prevSenText once or
> twice), or skip ahead a paragraph or two (call nextParText once or
> twice).  Under what circumstances would we want to "Jump ahead to 52nd
> sentence" or "rewind to sentence 11"? [...]

See my example above (The user that wants to directly hear chapter 8 of a 
book.

> But as I say, I don't see much need for either of these methods.  If an
> app *really* wanted to track every sentence individually, it could queue
> one sentence per job, as needed, i.e.,
> [...]
> // sentenceFinished signal received from KTTSD
> void slot_SentenceFinished();
> {
> 	// Start next sentence.
> 	++it;
> 	currentJobNum = setText(*it, Null);
> 	startText(currentJobNum);
> }
> [...] 

This way you might have a pause after every sentence (that otherwise could 
be avoided by preprocessing one sentence while the previous is spoken). 
What about:

currentJobNum = setText(<first chapter>, Null);
appendText(currentJobNum, <second chapter>);
...
appendText(currentJobNum, <last chapter>);
startText(currentJobNum);

\\ the user wants to jump to chapter 8
jumpTo(currentJobNum, 8);

> Now about limiting the API to only one text job at a time.  I've done a
> lot of thinking about this and strongly urge we not do that.  I assume
> we want to encourage KDE programmers to add speech capabilities to their
> apps.  If they look at the API and see:
>
> setText(const QString& text, const QString& talker)
>
> Queues a text message for speaking on the indicated talker.   If KTTSD
> is already speaking text, an error occurs.
>
> Nobody will want to code a wait loop to wait until the current text job
> ends. They will naturally look to the sayWarning and sayMessage methods
> instead, which we want to discourage for normal use.  sayWarning and
> sayMessage should be reservied for high-priority messages.   If we
> provide a weak API for normal messages, then programmers will tend to
> treat everything as high priority.
> [...]
I would rather see long texts as low-priority, messages as mid-priorits, 
messages as high priority (and screen-reader outout as highest-priority). 
If warnings and messages have the same API, then most users would use the 
low-level or the mid-level API.

We might want to courage the users to use the level that is appropriate for 
their speech output (i.e., we do not want the user to synthesize a long 
text sentence by sentence with sayMessage, but we also do not want KMail 
to cope with setText, startText, etc. just for telling the user that new 
mail arrived.

> If the rule is "the new text cancels text in progress", programmers will
> have the same reaction.  "You mean my speech job can be replaced by
> another application!  Uhm, maybe I should use sayMessage instead.."  And
> the programmer must code a signal handler if they need to queue more
> than one set of text.
>
> So the API I've proposed provides the most robust set of capabilities
> and greatest flexibility for text jobs, reserving sayWarning and
> sayMessage for high-priority jobs as they are intended.
>
> (BTW, multiple text jobs have already been implemented in the latest
> code in CVS.  kttsmgr includes a job manager fashioned closely after the
> print manager.

What about a user that thinks "Well, my job may be delayed for a long time 
until the other five jobs are finished? Let's use the sayMessage methods, 
so I know that it is spoken now."

Solving the problem of multiple long texts at the same time is difficult. 
Maybe the user should be asked whether he wants to cancel the old job, 
cancel the new job, or delay the new job. If he decides to delay, then a 
speech job manager is indeed needed.

> Take a look!) 
>

Sorry, but I could not compile kttsd. The compiler produced the following 
errors:

gunnars at aragorn:~/kde32/builddir/kdenonbeta> make -k
[...]
In file included from ../../../../../kdenonbeta/kttsd/plugins/festivalint/
festivalintconf.h:32,
                 from ../../../../../kdenonbeta/kttsd/plugins/festivalint/
festivalintconf.cpp:36:
festivalintconfwidget.h:15:44: /home/share/scratch/pluginconf.h: No such 
file or directory
[...]
In file included from ../../../../kdenonbeta/kttsd/kttsjobmgr/
kttsjobmgr.cpp:12:
../../../../kdenonbeta/kttsd/kttsjobmgr/kttsjobmgr.h:18:26: kspeech_stub.h: 
No such file or directory
[...]

-- 
Co-maintainer of the KDE Accessibility Project
Maintainer of the kdeaccessibility package
http://accessibility.kde.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://mail.kde.org/pipermail/kde-accessibility/attachments/20040521/69b0ab9a/attachment.pgp