Natural language processing tech for the desktop!

Jordi Polo mumismo at gmail.com
Thu Oct 23 09:17:21 BST 2008


Make link-grammar usable for KDE would be a good idea but for sure it is not
a 1-year project. Also, adding support for more languages is ... Not really
interesting.

I was thinking about getting automatic data extraction, tag it and add it to
Nepomuk. But most text or webpages I have myself on my computer are pretty
random...

Another idea may be getting the data from konqueror history or akregator
history or other data sources and create suggestions, etc. But I am not sure
the fact that there is a concert of a guy you have a lot of music files of
soon should really appear somewhere in your KDE desktop.

Also, here they are supposed to be interested in dialog management and
topics like that, but any thing that resembles the office clip scares people
...


On Thu, Oct 23, 2008 at 7:26 AM, Alexander Dymo <dymo at ukrpost.ua> wrote:

> > > I don't even think that spell and grammar checking should be separated
> > > very much, since a spell checker should ideally know about the sentence
> > > structure, too.
> >
> > Well, AFAIK (but checking is not my speciality), you can do a really good
> > and fast spell checker with simple statistical techniques and a simple
> > distance editing. For a grammar checker, you must have a full NL language
> > syntax parser and other techniques to find what are the errors and
> suggest
> > solutions. But, it's true that a good grammar checker must also be solid
> in
> > front of spelling errors.
>
> That's why I like the idea of using Link parser
> http://www.abisource.com/projects/link-grammar/
> http://www.link.cs.cmu.edu/link/
>
> It's quite convenient to use. It has both word dictionaries and grammar
> rules
> and when you try to build the graph (of links between words), it will
> figure
> out the morphology and syntax simultaneously.
>
> If the sentence is not correct, it will either leave words as not
> recognized
> morphologically (spelling error) or it will leave words outside the
> sentence
> link graph (syntax error). The algorithm to do that is IIRC O(n^3) which is
> great.
>
> I know abiword uses that now but I don't know how they do error reporting
> for
> syntax errors (which is quite interesting question itself).
>
> The only problem is that only english grammar is complete atm. There're
> italian and german grammars but they don't look like mature yet. There's
> also
> quite good russian grammar but it's unfortunatelly proprietary.
>
>


-- 
Jordi Polo Carres
NLP laboratory - NAIST
http://www.bahasara.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20081023/899362ae/attachment.htm>


More information about the kde-core-devel mailing list