[KDE-Sonnet] [Mountain Goat Programmer] New comment on Queen and Country.
Henrique Pinto
henrique.pinto at kdemail.net
Fri Jan 19 14:55:33 CET 2007
On Fri 19 Jan 2007 01:38, Jacob Rideout wrote:
> It now appears to me that Portuguese is special case, and a more
> general solution isn't acceptable. Tcatng uses a combined pt_PT and
> pt_BR corpus generated model to detect Portuguese, then uses
> specialized models to differentiate.
>
> Take a look at the .corpus files at this site:
> http://tcatng.cvs.sourceforge.net/tcatng/tcatng/language-profiles/pt-br/
>
> Are those words characteristic of their respective dialects?
Yes, they are. However, there are some very small problems with
brazilian.corpus:
"António" should be "Antônio";
"Brasilia" should be "Brasília";
"adóque" should be "adoque";
"Boceta" and "Buceta" are slang for "vagina", and considered really, really,
really unpolite. I don't think it is a good idea to include these terms,
they're rarely used (especially in written form).
--
Henrique Pinto
henrique.pinto at kdemail.net
More information about the kde-sonnet
mailing list