[Kde-accessibility] Factors that affect accuracy

Tue Oct 2 12:47:58 UTC 2012

Hi Steve,

On Tue, 2012-10-02 at 08:30 -0300, Steve Cookson wrote:
> As I mentioned in a previous email, we are moving into the pilot phase
> of our project and we are in the middle of defining objectives for the
> pilot.  One of these will be to test the accuracy of the speech
> recognition.  I'm interested in knowing what people find are the main
> factors that affect accuracy of recognition, so that we can test them.
> I suspect they include the following:
> 
> - the amount of voice training provided for the speech model;
> - the detection threshold set in simon;
> - the background noise;
> - what microphone you are using;

Yes, those are definitely important points.

In addition to those, one of the most critical aspect of a good
recognizer is a well designed vocabulary and grammar. So spend some time
with your Scenarios and make sure they are as good as can be.
Some tips:
* Make sure you don't have similar sounding words. Check the
transcriptions of your words - there should not be words that are
closely related. If they are, make sure that the grammar structures
don't allow them to be used interchangeably and that the other words in
the candidate sentences are sufficiently distinct.
* Make sure you have accurate transcriptions. Especially when dealing
with local dialects, it might pay off to let a linguist fine tune your
dictionary.

But the most important advice I can give you for the pilot phase is to
activate sample harvesting in Simond:
In the Simond user configuration (reachable, e.g. by clicking on the
KSimond tray icon), activate the option "Keep recognition samples".
With this setting, Simond will keep all samples that were used for the
recognition, along with the recognition results produced, in
~/.kde/share/apps/simond/models/<user>/recognitionsamples

This can be used not only for diagnostics later, but can even provide a
great way to train the model with practically gathered data. SAM 0.4
will also introduce an option to automate this conservative training
approach somewhat by parsing the logs produced by Simond.

As I mentioned in my other mail, Simond (and SAM) 0.4 also support CMU
SPHINX next to HTK / Julius. It would be interesting to see if SPHINX
would perform better or worse for your workload. You can perform tests
like that after-the-fact with SAM based on the collected recognition
samples. You can also play with recognizer options, different
dictionaries / grammars and possible audio filters once you have a good
test corpus.

Best regards,
Peter