[Kde-accessibility] Language models

Peter Grasch peter at grasch.net
Wed Sep 4 08:21:03 UTC 2013


Hi Kevin,

On 09/04/2013 03:43 AM, Kevin Scannell wrote:
>   I saw Peter Grasch's recent message to the kde-community list about
> setting up an "open speech group" under the KDE umbrella.   I'm a
> long-time KDE contributor (Irish l10n) but I wanted to reach out to
> the accessibility team concerning another aspect of my work.
As already mentioned in my earlier, private mail, it really is great to
hear from you, Kevin.

>   In my day job as an academic I work with language communities all
> over the world to help develop basic technologies like spelling and
> grammar checkers, dictionaries, and keyboard input methods (e.g.
> predictive text on mobile devices).   I'm interested very generally in
> seeing other language technologies "scaled up" to work for 100's or
> 1000's of languages.  Most everything I do is based on plain text
> corpora that I crawl from the web, for about 1500 languages:
> 
> http://borel.slu.edu/crubadan/
Such resources are obviously very useful.
As you also mentioned, for many (most) minority languages, there are no
speech recognition systems available at all, because they obviously lack
commercial viability. Semi-automatic open source approaches could make a
huge difference there.

However, with the limited resources we have right now, we will strive to
make the most (immediate) impact. In plain text, we will concentrate on
English for the moment. Our immediate goal must be to ensure a long-term
stable development community by recruiting both users and developers.
As there is significant overlap between different languages, we will
still provide the foundation for any further languages by concentrating
on the most popular one.

I know that your corpora right now target minority languages but your
system could probably still be used to crawl for e.g., English, right?
Crawling web content was one of our long-term ideas on how to source
data for our LM. If the system architecture already exists, that would
obviously be helpful.
Could you describe your crawler a bit? Is it open source?

Best regards,
Peter


More information about the kde-accessibility mailing list