[Kde-accessibility] Simon Question

Peter Grasch peter at grasch.net
Sun Mar 23 13:48:07 UTC 2014


Hello Jessica,

nice to meet you.

On Thursday, March 20, 2014 01:51:09 PM Jessica Horst wrote:
> My colleague told me that speech recognition software works by having a
> threshold of similarity. For example, when I tell my mobile phone ³call
> home² the software compares what I said to what I have said before and if it
> is similar enough (above threshold) it will recognise my speech. I¹m
> hopeful that I could use the same kind of principle here (how similar is
> the child¹s speech to the adult speech (what was said before), but I would
> want a numerical value instead of just knowing if it was above or below
> threshold.
I am sorry to say but you have been slightly misinformed. In practice the 
process is slightly different.
(Disclaimer: the following explanations contains a few simplifications)
The decoding produces the most likely path through the space of alternatives 
(allowed sentences, if you are doing grammar based decoding). The question 
answered by the decoding is: Given the observations (recording), which of the 
possibilities (sentences) is the most likely?
To determine the most likely candidate, there is an internal scoring process 
but these scores are entirely relative to each other and not compared to a 
fixed threshold. Most decoders implement some form of confidence scoring, 
telling you how confident the system is in it's results, but these scores will 
likely not be what you want because differences that appear substantial to the 
human ear will not necessarily have a big impact on the confidence score and 
the other way around. 

Depending on your use case a dedicated classifier will probably yield better 
results. What exactly do you want to do?

Best regards,
Peter


More information about the kde-accessibility mailing list