New multimedia architecture and accessibility/TTS requirements

Thu Apr 28 18:00:52 BST 2005

I am told that this mailing list has been discussing the future KDE multimedia 
architecture.  I am told the general plan is to develop a generic API and 
implement specific driver backends, probably starting with GStreamer.

I would like to make KDE developers aware of audio requirements as they relate 
to accessibility and text-to-speech (TTS).

Below is a draft document being discussed on the freedesktop.org accessibility 
mailing list.  While the document is not finished, I thought it would be a 
good idea to let this group know sooner than later what sort of requirements 
are needed of a multimedia architecture for purposes of accessibility and 
TTS.  

Note that accessibility imposes some special requirements that aren't needed 
for typical audio applications.  Specifically, I call your attention to 
section "2.3 Real-time audio output" below and note that my experience with 
GStreamer 0.8.7 does not meet the requirement.  Also, I have asked for the 
following requirement to be added to section 2.3:

--
Add section 2.3.4. MUST HAVE: Applications must be able to tell audio 
framework to pause (and resume) or stop (and discard) audio playback.  
Playback pauses or stops immediately.
--

In my experience, GStreamer can take up to a second to stop playback.

I also wish to mention that the KDE Text-to-Speech System (KTTS) is hoping to 
migrate towards using Speech Dispatcher as its speech backend.  Among other 
things, this will provide TTS capabilities from boot-up to shutdown even if 
KDE is not yet running.  It will also provide TTS capability for terminal 
applications.  At the moment, Speech Dispatcher outputs audio directly 
to /dev/snd device or OSS.  They are considering adding ALSA support.  They 
have tried to implement NAS support, but have found it has a tendency to 
crash or mis-behave under heavy realtime-targeted use.  I mention Speech 
Dispatcher because it presents some challenges when considering 
integration/compatibility with the new KDE architecture, given that Speech 
Dispatcher is not specific to the K desktop.   More information about Speech 
Dispatcher at http://www.freebsoft.org/speechd.

Thanks for your kind attention.

Gary Cramblitt (aka PhantomsDad)
KDE Text-to-Speech Maintainer
http://accessibility.kde.org/developer/kttsd/index.php

Here is the draft document.

--------------------

[Accessibility] Audio framework requirements
 From: Hynek Hanke <hanke at brailcom.org>
 To: accessibility at lists.freedesktop.org

Hello all,

just recently, an effort was started to find the requirements on an audio
framework, audio architecture or audio sound system that would be both
accessible and suitable for developing accessibility tools. The final goal of
this effort after the requirements are found is to find a way how to create a
solution that fulfills them, probably by taking an existing solution and 
adding
the missing parts.

Here is a draft of the requirements document that incorporates the ideas from
previous more general discussions. I've used a very similar structure as the
one of the document about TTS API.

Please send comments and suggestions.

Thank you,
Hynek Hanke

Accessibility requirements on audio frameworks
==============================================
Document version: 2005-04-01

The purpose of this document is to describe the requirements on an audio
(multimedia) framework that provides all the necessary features so that a Free
Software or Open Source desktop is accessible, especially to handicapped 
people,
and so that it's possible to implement further assistive technologies on it
without having to care about the particular details of audio output. The
purpose of this document is not to define the particular API that will be used
for the communication with a given audio framework.

By the term ``audio framework'' this document means a full solution which
enables application developers to easily have their sound data output through
the speakers in a way compatible with accessibility concerns and that enables
users (possibly handicapped ones) to control the audio in their system in the
way they need. The term should not imply that library-based solutions are
preferred over server-based or even kernel-based solutions.

A. Structure
============

The requirements are categorized in the following priority order: MUST HAVE,
SHOULD HAVE, and NICE TO HAVE.

The priorities have the following meanings:

     MUST HAVE: All conforming audio frameworks must have this capability.

     SHOULD HAVE: The audio framework will be usable without this feature, but
       it is expected the feature is implemented in all solutions intended for
       serious use.

     NICE TO HAVE: Optional features that make the audio framework better
       usable for a handicapped person.

Requirements outside the scope of this document will be labelled as OUTSIDE
SCOPE. They serve to avoid confusion or to define some points more precisely.

A. Requirements
===============

1. General requirements

   1.1 Documentation

       1.1.1 MUST HAVE: The API is well documented.

       1.1.2 MUST HAVE: Example application(s) showing how to implement/use 
the
             features described in section (2) of the requirements is 
provided.

   1.2 Portability

       1.2.1 MUST HAVE: The audio framework API is independent of the system 
in
             use with all its basic functionality and the features described 
in
             section (2) of the requirements.

   1.3 Design

       1.3.1 OUTSIDE SCOPE: Whether the conformance with these requirements is
             achieved by a single application or by several cooperating
             applications or by a combination of applications and libraries is
             outside scope of this document.

2. Audio requirements

   2.1 Network Transparency

       2.1.1 MUST HAVE: Full redirection of the audio output to a different
             machine is possible. It must be possible to do this centrally
             without having to reconfigure end-user applications.

             Reason: It is an important accessibility feature for remote
             access.

       2.1.2 OUTSIDE SCOPE: Whether audio redirection is achieved by the
             multimedia framework itself or by the underlying audio
             architecture is outside scope of this document. However, it must
             be ensured that such an architecture exists on each platform and
             that the audio framework conforms to all other requirements in
             section (2) in such a configuration.

   2.2 Data transfer and handling of different formats

       2.2.1 MUST HAVE: The audio framework must be able to receive data by
             specifying a datafile on local computer to read *and by sending
             the data directly from the memory over a socket or by similar
             means*.

       2.2.2 MUST HAVE: If the data is sent directly from memory without using
             a file, it must be possible to send all the data at once at the
             beginning of the playback. The client application must not have 
to
             care about any buffering and timing issues.

       2.2.3 MUST HAVE: It's possible to send data as a raw stream and specify
             all the related audio parameters manually.

       2.2.4 MUST HAVE: It's possible to send audio data at least in the PCM
             WAV and OGG Vorbis formats without any need to perform previous
             decoding or parsing of the data.

   2.3 Real-time audio output

       2.3.0 For the purpose of this section, the word ,,immediately'' means
             ,,no later than in 20 milliseconds'' as (MUST HAVE) and ,,no 
later
             than in 10 milliseconds'' as (NICE TO HAVE).

             Reason: Reading of characters on the screen when using assistive
             technologies and cancelling the audio output to be able to read a
             new character should be fast to catch up with a fast typist or
             even with autorepeat. Consider a typical autorepeat rate 25
             characters per second. Ideally within each of the 40 ms intervals
             synthesis should begin, produce some audio output and stop. The
             time to start and stop the audio playback must fit into this 
time,
             including processing the request and decoding the incoming data.

       2.3.1 MUST HAVE: The playback in the speakers starts immediately after
             audio data are sent to the audio framework. The transport
             mechanism used (both of the schemes from paragraph 2.2.1) must
             allow enough time for this when both the data to play and the
             device intended for playback are located on the same machine. 
When
             the transport of the data is done over network, the requirement
             only applies after the data is fully received by the target
             machine.

       2.3.2 MUST HAVE: When the playback on the speakers terminates, the
             application that issued the request for playback must be 
informed.

       2.3.3 NICE TO HAVE: The application is notified when certain previously
             specified times in the sent audio data are reached.

   2.4 Simultaneous audio output and volume control

       2.4.1 MUST HAVE: The audio framework is able to play several audio
             streams at once and mix them together automatically *without any
             effort of the client application itself*.

             Reason: It must be possible to run several applications using
             audio output without the fear that one of them will block the
             output for the rest. For accessibility this is essential since 
the
             speech output must always pass through, but shouldn't block any
             other media output.

       2.4.2 SHOULD HAVE: The user is able to separately control the maximum
             and/or default volume of sounds originating from different
             applications (according to their identification passed to the
             audio framework) statically from a configuration file or by
             similar means.

       2.4.3 NICE TO HAVE: The user is allowed to specify priorities for 
client
             applications (according to their identification passed to the
             audio framework) so that it's possible to *automatically* mute
             some applications or decrease their volume when there is a more
             important sound to play.

    2.5 Destination of audio flows

        2.5.1 NICE TO HAVE: User is able to specify a desired destination
              of/redirect the audio flow (e.g. sound card 1, sound card 2,
              network) for different applications that are using this audio
              framework (without the end-user application having to care about
              it).

    2.6 Compatibility

       2.6.1 MUST HAVE: The whole multimedia framework must be able to run on
             top of Advanced Linux Sound Architecture and Open Sound System on
             the GNU/Linux system.

             Reason: If we want to use it in near future, we must ensure that 
by
             doing so, we don't deny accessibility to larger groups of people.
             The necesity to support Open Sound System will probably be 
dropped
             in future.

             Open Issue: What other architectures must be listed so that we 
can
             all agree such a requirements is appropriate for all of us? Gnome
             accessibility, KDE accessibility, others?

C. Copying This Document
========================

  Copyright (C) 2005 ...
  This specification is made available under a BSD-style license ...
_______________________________________________
Accessibility mailing list
Accessibility at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/accessibility