[Kde-accessibility] Simon Question
Peter Grasch
peter at grasch.net
Sun Mar 23 13:48:07 UTC 2014
Hello Jessica,
nice to meet you.
On Thursday, March 20, 2014 01:51:09 PM Jessica Horst wrote:
> My colleague told me that speech recognition software works by having a
> threshold of similarity. For example, when I tell my mobile phone ³call
> home² the software compares what I said to what I have said before and if it
> is similar enough (above threshold) it will recognise my speech. I¹m
> hopeful that I could use the same kind of principle here (how similar is
> the child¹s speech to the adult speech (what was said before), but I would
> want a numerical value instead of just knowing if it was above or below
> threshold.
I am sorry to say but you have been slightly misinformed. In practice the
process is slightly different.
(Disclaimer: the following explanations contains a few simplifications)
The decoding produces the most likely path through the space of alternatives
(allowed sentences, if you are doing grammar based decoding). The question
answered by the decoding is: Given the observations (recording), which of the
possibilities (sentences) is the most likely?
To determine the most likely candidate, there is an internal scoring process
but these scores are entirely relative to each other and not compared to a
fixed threshold. Most decoders implement some form of confidence scoring,
telling you how confident the system is in it's results, but these scores will
likely not be what you want because differences that appear substantial to the
human ear will not necessarily have a big impact on the confidence score and
the other way around.
Depending on your use case a dedicated classifier will probably yield better
results. What exactly do you want to do?
Best regards,
Peter
More information about the kde-accessibility
mailing list