[KDE-Sonnet] [Mountain Goat Programmer] New comment on Queen and Country.

Henrique Pinto henrique.pinto at kdemail.net
Fri Jan 19 14:55:33 CET 2007


On Fri 19 Jan 2007 01:38, Jacob Rideout wrote:
> It now appears to me that Portuguese is special case, and a more
> general solution isn't acceptable. Tcatng uses a combined pt_PT and
> pt_BR corpus generated model to detect Portuguese, then uses
> specialized models to differentiate.
>
> Take a look at the .corpus files at this site:
> http://tcatng.cvs.sourceforge.net/tcatng/tcatng/language-profiles/pt-br/
>
> Are those words characteristic of their respective dialects?

Yes, they are. However, there are some very small problems with 
brazilian.corpus:

"António" should be "Antônio";
"Brasilia" should be "Brasília";
"adóque" should be "adoque";
"Boceta" and "Buceta" are slang for "vagina", and considered really, really, 
really unpolite. I don't think it is a good idea to include these terms, 
they're rarely used (especially in written form). 

-- 
	Henrique Pinto
	henrique.pinto at kdemail.net


More information about the kde-sonnet mailing list