calligra-devel Digest, Vol 19, Issue 67

Mon May 14 23:39:47 BST 2012

Hi,
I went through the paper and few more related articles. And as Boemann
already mentioned, many of the features are under ongoing work in
link-grammar which is undergoing research and implementation in OpenCog NLP
subsystem.
It already uses probabilistic approach
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/link/pub/www/papers/ps/gram3gram.pdf
but the APP/EPP and distorted treebank methods described in the paper
promises better error detection and many shortcomings of link-grammar like
inability to handle quotations, long-run sentences etc. are resolved here.

So, my views are if we want a really good parser we can work on
link-grammar parser as base and we can take help from link-grammar people
also. There are few more available parsers which can be used for help like
maximum entropy based statistical parser like Charniak & Bikel parser.

Or the other option can be we can implement link-grammar parser in our
grammarchecker plugin and as link-grammar will improve, accordingly user
can update his/her version of link-grammar. We can still help to improve
the parser.

I also went through a paper regarding LanguageTool which is a rule-based
grammar checker
http://www.danielnaber.de/languagetool/download/style_and_grammar_checker.pdf
it has an advantage that is, it is available in many languages. Still I
think link-grammar is better.

I need to read the paper a few more times to understand it completely.
Waiting for suggestions and opinions.
:)

On Mon, May 14, 2012 at 9:14 AM, Elvis Stansvik <elvstone at gmail.com> wrote:

> Den 14 maj 2012 12:44 skrev "C. Boemann" <cbo at boemann.dk>:
>
> >
> > On Monday 14 May 2012 12:29:01 matus.uzak at gmail.com wrote:
> > > Hi,
> > >
> > > I don't think that a grammar checker based entirely on a Bayes
> > > classifier is logically sound.
> > >
> > > Simplified:
> > >
> > > In order to detect textual spam, the Bayes classifier is first trained
> > > on examples of spam (training set).
> > > The classifier quality depends on the training set being
> > > representative enough, the textual data representation (input to the
> > > classifier)
> > > and parameters of the training algm.  The trained classifier is then a
> > > set S of (mean value, variance) pairs in input space which represent
> > > known spam.
> > > If a previously unknown input falls into the variance range of any of
> > > the members of S, then it's labeled as spam.
> > >
> > > A grammar checker should have the language grammar represented
> > > exactly, by a formal grammar usually.  Again a feasible representation
> > > of the textual data is required. Then you check if a sentence can be
> > > generated by the formal grammar.  The answer is in {yes, not}.
> > >
> > > Lightproof seems to be rule based. And rule based systems have strong
> > > maintainability drawbacks.
> > >
> > > A combination of a rule based system with Bayes sounds promising. That
> > > would enable something like context based grammar checking.
> > >
> > > br,
> > >
> > > -matus uzak
> > The trouble with rules is that it's hard to codify a language grammar in
> a way
> > that wont give false warnings. Also as you said it is hard to maintain,
> not to
> > say it requires manual work to define new languages.
> >
> > Bayes may not be the best match, but something that is adaptive and can
> learn
> > by giving it a corpus, sounds very promising to me.
> >
> > From Elvis Stansvik I got the following link which I've passed on to
> garima
> > already:
> >
> > http://doras.dcu.ie/16776/1/jw_binder_2012-01-10.pdf
> >
> > Now this may be what link grammar does already. I just read back and the
> link
> > grammar page on the Abisource site does mention tree-bank and statistical
> > which is what the paper talks about too. There may be differences, but
> not sure
> > it's worth it to make us do something on our own.
> >
> > So I've maybe changed my mind again and would favour the link grammar
> >
> > Right now I'm just waiting for garima to reply with some analysis. He was
> > going to read the paper.
>
> That's quite ambitious of him; it's not just a paper but a 200+ page
> dissertation :)
>
> >
> > Boemann
> > _______________________________________________
> > calligra-devel mailing list
> > calligra-devel at kde.org
> > https://mail.kde.org/mailman/listinfo/calligra-devel
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel at kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20120514/7686192d/attachment.htm>