[KDE-Sonnet] Should we detect dialects?

Paul Betts paul at paulbetts.org
Wed Jan 17 22:10:52 CET 2007


Ar Céadaoin 17 Eanáir 2007 13:40, scríobh Jacob R Rideout:
> Hello everyone, this is the first post to new kde-sonnet list!
>
> I just wrote a blog post explain why Sonnet doesn't detect dialects
> and how it handles them.
>
> http://blog.jacobrideout.net/2007/01/queen-and-country.html
>
> So, am I right? Or is there a need to change the behavior? I've made
> series of trade-offs based off a set of assumptions of end-user use
> cases. Are these assumptions and use cases correct? Should we keep the
> current behavior, but trigger additional heuristics in certain cases?

>From the practical standpoint of writing a spell checker, I don't see
how you couldn't try to detect the dialect; as an American, if I'm
typing a letter and it corrects all my 'color', 'realize', and
'program's, it's basically giving me wrong information. 

The good news is, at least for spell checking in English you've got a
bit of an advantage; if you don't see any known different words between
the two dialects, you can just treat them as the same. This Email, for
example (minus the intentional examples) is probably indistinguishable
between en_US and en_GB, as are a lot of other texts.

I suspect the best way to go is to make a "hint words / dialect
indicator points" list. Go through the words, sum the points and
whichever dialect scores the highest, go with it. For example, "lorry =>
+25, colour => +5", because 'lorry' doesn't even exist in American
English, whereas it's possible that some Yank just wanted to be fancy
and write 'colour'.

-- 
Paul Betts <paul at paulbetts.org>



More information about the kde-sonnet mailing list