[Kde-accessibility] Language models

Tue Sep 3 18:43:37 UTC 2013

Hello all,

  I saw Peter Grasch's recent message to the kde-community list about
setting up an "open speech group" under the KDE umbrella.   I'm a
long-time KDE contributor (Irish l10n) but I wanted to reach out to
the accessibility team concerning another aspect of my work.

  In my day job as an academic I work with language communities all
over the world to help develop basic technologies like spelling and
grammar checkers, dictionaries, and keyboard input methods (e.g.
predictive text on mobile devices).   I'm interested very generally in
seeing other language technologies "scaled up" to work for 100's or
1000's of languages.  Most everything I do is based on plain text
corpora that I crawl from the web, for about 1500 languages:

http://borel.slu.edu/crubadan/

  It seems to me these could be useful in creating n-gram language
models for many languages that don't yet enjoy speech recognition (and
many languages where written literacy isn't the norm and speech input
could have tremendous impact).

   Just hoping to start the discussion, and let you know that these
resources are out there.

Kevin