[KDE-Sonnet] Should we detect dialects?

Thu Jan 18 00:25:50 CET 2007

On 1/17/07, Jacob R Rideout <kde at jacobrideout.net> wrote:
> Of course, we are going to give the spell checker a full RFC 4646, if
> possible. My question is: what is the best way to do this in cases
> where it is needed  (almost all). There are real performance concerns
> when doing this in a real-time setting. So, we must make tradeoffs,
> perhaps sacrificing accuracy for speed.

OK, I missed the thrust of your question then.  As far as performance,
a full Bayesian classifier is no slower than running a spellchecker
(as Paul said, basically one hash table lookup per word).   It would
eat up more memory, but you can save by only keeping the words with
the largest weights (i.e. keep "lorry", "colour"  but not "this",
"that").

>
> As Kevin mentioned, we could turn this into some kind of Bayesian
> classifier. But again, I question the need to do this in a practical
> setting. Is the cost of doing this kind calculation worth the benefit
> over naive methods like mine?
>
> We could have some sort of setting, so that users who fall into this
> category have the option employing this Bayesian classifier.

I think this is the way to go - something like N-grams in general and
then something more sophisticated for the subtle cases (and assuming
the user wants/needs it  - I confess though that I have no real sense
for the big picture here)

Kevin