Textfile classification (encoding, languages etc.)

Malte Starostik malte at kde.org
Thu Sep 25 20:54:31 BST 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 25 September 2003 21:42, Zack Rusin wrote:
> On Thursday 25 September 2003 15:06, Malte Starostik wrote:
> > PS: any comments on making KSpell use libaspell or pspell instead of
> > an external process if available?
>
> Oh, yeah, I'll be rewriting it once I'll get some more time. Laurent
> wrote kospell which kind of does this but keeps the KSpell api and
> makes creating new backends rather a pain. I like Enchant, but I'm
> still not too keen on the Glib dependency. I like how instead of using
> the ispell process they simply wrote it as a library and are using it.
> We should do the same so that instead of using kprocess we use the
> libraries directly.
> So, we might meet on irc or start a discussion at some point and decide
> whether we want to write a completely new implementation - we have
> enough of use cases and after spending too much time with kspell and
> other spell checkers I know what's needed so I'd vote for that. We can
> also use Enchant. The problem with that is that we would have to write
> our frontend to it anyway, which would pretty much end up with #1 but
> witch Enchant as the only backend.
> But anyway, what algorithm are you using to detect the languages? Is it
> regexp based or is something more fun? You definitely got my full
> attention.

I didn't know Enchant, looks interesting, provided our frontend to the 
frontend would stay reasonably small.
I've based the implementation on the Linuga::Ident perl module which uses tri- 
and bigrams. "Based on" means a bit more than a plain perl-C++ translation 
and a bit less than a complete rewrite. It's damn small but reliable.

- -Malte
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/c0f6VDF3RdLzx4cRAgq4AJ923CAnhc2Yke13iUXdiEWXLrwtzwCghPRg
lXMjryIthxJ3CQikmznFEyI=
=g1BP
-----END PGP SIGNATURE-----




More information about the kde-core-devel mailing list