calligra-devel Digest, Vol 19, Issue 67

Elvis Stansvik elvstone at gmail.com
Mon May 14 14:14:57 BST 2012


Den 14 maj 2012 12:44 skrev "C. Boemann" <cbo at boemann.dk>:
>
> On Monday 14 May 2012 12:29:01 matus.uzak at gmail.com wrote:
> > Hi,
> >
> > I don't think that a grammar checker based entirely on a Bayes
> > classifier is logically sound.
> >
> > Simplified:
> >
> > In order to detect textual spam, the Bayes classifier is first trained
> > on examples of spam (training set).
> > The classifier quality depends on the training set being
> > representative enough, the textual data representation (input to the
> > classifier)
> > and parameters of the training algm.  The trained classifier is then a
> > set S of (mean value, variance) pairs in input space which represent
> > known spam.
> > If a previously unknown input falls into the variance range of any of
> > the members of S, then it's labeled as spam.
> >
> > A grammar checker should have the language grammar represented
> > exactly, by a formal grammar usually.  Again a feasible representation
> > of the textual data is required. Then you check if a sentence can be
> > generated by the formal grammar.  The answer is in {yes, not}.
> >
> > Lightproof seems to be rule based. And rule based systems have strong
> > maintainability drawbacks.
> >
> > A combination of a rule based system with Bayes sounds promising. That
> > would enable something like context based grammar checking.
> >
> > br,
> >
> > -matus uzak
> The trouble with rules is that it's hard to codify a language grammar in
a way
> that wont give false warnings. Also as you said it is hard to maintain,
not to
> say it requires manual work to define new languages.
>
> Bayes may not be the best match, but something that is adaptive and can
learn
> by giving it a corpus, sounds very promising to me.
>
> From Elvis Stansvik I got the following link which I've passed on to
garima
> already:
>
> http://doras.dcu.ie/16776/1/jw_binder_2012-01-10.pdf
>
> Now this may be what link grammar does already. I just read back and the
link
> grammar page on the Abisource site does mention tree-bank and statistical
> which is what the paper talks about too. There may be differences, but
not sure
> it's worth it to make us do something on our own.
>
> So I've maybe changed my mind again and would favour the link grammar
>
> Right now I'm just waiting for garima to reply with some analysis. He was
> going to read the paper.

That's quite ambitious of him; it's not just a paper but a 200+ page
dissertation :)

>
> Boemann
> _______________________________________________
> calligra-devel mailing list
> calligra-devel at kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20120514/c478b8ba/attachment.htm>


More information about the calligra-devel mailing list