[Kde-accessibility] Simon: First IRC Meeting (take 2): Minutes

Sat Jul 27 20:41:49 UTC 2013

Hi,

we just wrapped up the second one of the introductory meetings.

The following people took part (personal introduction in quotes):
* Simon: "hi. i'm simon. i'm a software developer and have 10 yrs
experience as an entrepreneur. i'm currently working on a platform to
realize large-scale transcription of audio/video by leveraging a
language learner community. i can contribute hardware and contributors
(speakers, transcriptors) and some financial help if needed. i also want
to contribute to the acoustic / language models"
* Adam Nash: "I am Adam Nash.  I am a GSoC graduate that worked on Simon
for my project.  I have a year or so of professional software
engineering experience now.  I will be new to most of the topics that
will be addressed in this project."
* Jon Lederman: "I'm Jon Lederman.  I am co-founder of a tech startup
called SonicCloud in SF Bay Area.  We are developing a platform to
improve the richness of communications, which will also significantly
benefit the hearing impaired.  My background is in theoretical particle
physics and EE.  I'm quite interested in speech rec algorithms.  I'd
like to work on the speech model and algorithms.  I'd like to work on
also building an open source repository for speech models."
* Nicolás Alvarez: "I'm Nicolás Alvarez, currently a software
engineering student (*ahem* contributing to KDE when I should be studying)"
* Abinash Panda: "I am interested in large audio transcription tool"

Minutes:

1. Introduction

2. Short introduction to the basics of speech recognition.

3. Some of the people there already worked on some tasks before the meeting:
3.1. Simon already started work on a web-based audio transcription
application based on pocketsphinx (running on a server). The goal is to
make it easy to semi-automatically transcribe long recordings; the
corrected transcription can later be used to force-align the initial
recording to yield new training data for the acoustic model.
3.2. Jon Lederman is working on real-time transcription for VoIP calls
using Freeswitch.
3.3. Nicolás Alvarez was planning on making a tool to crowdsource
transcription of segmented Akademy talks.

4. Assigned tasks:
4.1. Simon and Abinash jointly claimed the "Large audio transcription
tool" task. Simon hopes to show us a demo of his prototype (see above)
tomorrow.
The basic idea is this: a.) Propose a transcription using ASR, then b.)
let the user correct the transcription. c.) Do forced alignment based on
this corrected transcript. OnError: Goto b.
The already corrected sections may be used to adapt the acoustic model
to incrementally improve recognition rates while transcribing.
An intermediate goal was proposed to skip the ASR integration and to
concentrate on manual transcription (and well integrated forced-alignment).
4.2. Jon Lederman claimed the noise cancellation and echo cancellation
tasks. There might already be existing solutions at the PulseAudio
level. He'll look into it in the coming week.
4.3. Simon proposed to use Amazons cloud offering (EC2, EBS) to
facilitate quicker model creation. He generously offered to cover the
expenses (thanks!).
We then discussed the requirements we'd have for such a server: At the
very least 150 GB of hard drive storage (more if possible), >= 16 GB of
memory (language model generation) and as much computing power as the
budget allows (acoustic model generation is computationally expensive
but well parallelizable).
Flexible pricing would be preferred (to up- or downgrade based on what
we actually need).
Simon will get back to us with a concrete proposal.
4.4. Adam Nash expressed interested in getting involved in building
speech models. We'll discuss the concrete task in the coming days.

I'll send out an invitation to the next meeting right after this.

Best regards,
Peter