[Kde-accessibility] Another speech engine for KTTS?

Thu Feb 9 01:07:31 CET 2006

On Wednesday 08 February 2006 11:39, Jonathan Duddington wrote:
> In article <4df5b93ce6jsd at clara.co.uk>,
>
>    Jonathan Duddington <jsd at clara.co.uk> wrote:
> > In article <200602071959.54036.garycramblitt at comcast.net>,
> >    Gary Cramblitt <garycramblitt at comcast.net> wrote:
> >
> > portaudio is a "cross platform open-source audio I/O library".
> > Details are at: www.portaudio.com
> >
> > The Debian package  libportaudio0  was included on the Mepis Linux
> > distribution that I use.  It should be readily available.
> >
> > > Based on the README and "speak --help" it does not appear that
> > > linux_speak will create a wav file.
>
> If you were able to do  speak --help  then libportaudio must have been
> present, otherwise the speak program would fail to run at all.  I used
> the PortAudio library because I noticed that the Audacity sound editor
> used it.  It allows the program to process the next sentence while
> speaking the previous one.
>
> > Thanks for the information. Producing a WAV output file with a -w
> > argument should be simple enough.
>
> I've updated it now.
> http://home.clara.net/jsd/linux/linux_speak.zip
>
> Version 1.03 now includes a -w argument to produce the speech output as
> a WAV file.  This works OK with KTTS, except there seems to be a small
> amount of noise (hiss) added which was not there in the WAV file.
>
> I made sure the KTTSMgr "Speed" setting was set to 100%.  This seems to
> be a global setting, applied to all talkers.  Is there a way to prevent
> the SOX "Speed" being applied to talkers which can manage their own
> speed, since that gives better quality, which you wouldn't want
> disturbed when you adjust the speed of one of the other talkers.

Works very nicely!  Here is my Command plugin setup for speaking English (UK)

Language: English (UK)
Command: speak --stdin -w %w
Encoding: Local (UTF-8)   but ISO-8859-1 is better if you have a non-english 
desktop setting
Send the data as standard input: checked

I agree that the UK voice is very nice.

Here's my setting for German

Language: German
Command: speak --stdin -w %w -vgerman
Encoding: ISO-8859-1
Send the data as standard input: checked

Now in theory, I should also be able to set the encoding to UTF-8, but there 
seems to be a difference in this case.  Can a german speaker please play with 
this and let me know what works correctly?

If the Speed setting on the Audio tab of KTTSMgr is set to 100%, then Sox 
should not be involved at all.  If it is, it's a bug.  If you change it to 
something other than 100%, then the wav file is sent through Sox to change 
the talking speed and this affects all speech (or at least all the cases 
where the synth returns a wav file.)

I notice that linux_speak also offers the following command line options:

-a<integer>
Sets amplitude (volume) in a range of 0 to 20.  The default is 10.

-s<integer>
Sets the speed in words-per-minute (for the default voice, others may
differ slightly). Default value is 160. I generally use a faster speed
of 168.

To support these, we'd need to write a plugin for ktts.  This would include a 
configuration dialog where users could adjust the volume and speed settings.  
Of course users are free to set up more than one Command Talker with various 
settings hard-coded in the command.

However, there's a problem in that the kdeaccessibility module in KDE 3.5 is 
currently feature and string frozen, so even if I wrote such a plugin, I 
could not add it to the kdeaccessibility module.  KDE4 is a ways off.  I 
could write the plugin and release the code separately, but how many people 
would bother to download it and build it, especially when the Command plugin 
provides basic functionality?

Furthermore, I plan to replace the ktts backend with Speech Dispatcher for 
KDE4.  Therefore, it would be better to develop a linux_speak driver for 
Speech Dispatcher.  I will approach the SD folks about that, assuming you 
release linux_speak under GPL or LGPL.

>How is the speech engine told what is the character set of the input
>text?

If you are asking about ktts, the Talker that is chosen is determined from the 
Talker Code that is passed in the dcop call to kttsd.  See the discussion on 
Talker Codes here:

http://websvn.kde.org/branches/KDE/3.5/kdelibs/interfaces/kspeech/kspeech.h?view=markup

(about 1/4th the way down)  Within KDE, all text is passed to kttsd as UTF-8.  
Once a Talker has been picked.  The Talker plugin configuration determines 
how the text is encoded for passing to the synth.  So for example, if I do 
the following:

dcop kttsd KSpeech sayText "Guten Tag" "de"

and I have a Talker configured for linux_speak using the Command plugin and 
German voice as I gave above, then kttsd will use that talker because I told 
it the text is German (the "de" argument). The Command plugin will encode the 
text from UTF-8 to ISO-8859-1 because that is the encoding setting I put in 
the Command configuration, and the encoded text will be passed to linux_speak 
via StdIn.

So ultimately, the talker chosen, and therefore the encoding used, is 
determined by the Talker Code argument that the application passes to kttsd.  
In practice, most apps specify a NULL Talker Code, so the topmost Talker the 
user has configured in KTTSMgr is the one used.

>Any ideas for a more distinctive name than "speak"?

Obviously, "speak" is likely to collide with some other future TTS engine.  Is 
the linux_speak code highly dependent upon Linux, or could it be easily 
adapted to BSD, Solaris, or AIX?  If so, I wouldn't use Linux in the name 
either.  Maybe freespeak, but that might be "taken" already.  Its a tough 
problem because you want the name to be distinctive and catchy, but also easy 
to remember.

> Is KTTS resampling the sound or processing it in some way?
> I produced the WAV with a 22050 Hz sample rate.

I'm not hearing any difference here between speech sent to linux_speak and 
played via kttsd versus played directly by linux_speak.

Other than Sox, ktts isn't messing with the wav file, but the audio output 
plugin might.  Are you using Alsa?  If so which version?  And which version 
of ktts are you using (right click on systray icon and choose About)?  I made 
some changes to the Alsa plugin for ktts 0.3.5.1, which was released with KDE 
3.5.1.  These might make a difference.

The sound will be "resampled" as it passes through the Alsa pcm "devices".  
This is highly configuration dependent.  BTW, here is a wav file produced by 
Festival:

aplay /tmp/kde-kde-devel/kttsd-4ZVtsb.wav
Playing WAVE '/tmp/kde-kde-devel/kttsd-4ZVtsb.wav' : Signed 16 bit Little 
Endian, Rate 16000 Hz, Mono

You can try specifying different Alsa pcm device names to see if that makes a 
difference.

If you want to try to figure out better what Alsa is doing, you can turn on 
some debugging code in the latest ktts alsaplayer plugin.  Assuming you've 
built ktts with --enable-debug=full, add the following to your 
~/.kde/share/config/kttsdrc file:

[ALSAPlayer]
DebugLevel=2

You'll need to start kttsd in a konsole, and start kttsmgr in a separate 
console.  The debug output will appear in the kttsd konsole.  I warn you, 
there will be a lot of output.

As soon as you release linux_speak with license info, I'll be happy to put 
links and instructions on the kttsd website.

-- 
Gary Cramblitt (aka PhantomsDad)
KDE Text-to-Speech Maintainer
http://accessibility.kde.org/developer/kttsd/index.php