calligra-devel Digest, Vol 19, Issue 67

C. Boemann cbo at boemann.dk
Mon May 14 11:44:14 BST 2012


On Monday 14 May 2012 12:29:01 matus.uzak at gmail.com wrote:
> Hi,
> 
> I don't think that a grammar checker based entirely on a Bayes
> classifier is logically sound.
> 
> Simplified:
> 
> In order to detect textual spam, the Bayes classifier is first trained
> on examples of spam (training set).
> The classifier quality depends on the training set being
> representative enough, the textual data representation (input to the
> classifier)
> and parameters of the training algm.  The trained classifier is then a
> set S of (mean value, variance) pairs in input space which represent
> known spam.
> If a previously unknown input falls into the variance range of any of
> the members of S, then it's labeled as spam.
> 
> A grammar checker should have the language grammar represented
> exactly, by a formal grammar usually.  Again a feasible representation
> of the textual data is required. Then you check if a sentence can be
> generated by the formal grammar.  The answer is in {yes, not}.
> 
> Lightproof seems to be rule based. And rule based systems have strong
> maintainability drawbacks.
> 
> A combination of a rule based system with Bayes sounds promising. That
> would enable something like context based grammar checking.
> 
> br,
> 
> -matus uzak
The trouble with rules is that it's hard to codify a language grammar in a way 
that wont give false warnings. Also as you said it is hard to maintain, not to 
say it requires manual work to define new languages.

Bayes may not be the best match, but something that is adaptive and can learn 
by giving it a corpus, sounds very promising to me.

From Elvis Stansvik I got the following link which I've passed on to garima 
already:

http://doras.dcu.ie/16776/1/jw_binder_2012-01-10.pdf

Now this may be what link grammar does already. I just read back and the link 
grammar page on the Abisource site does mention tree-bank and statistical 
which is what the paper talks about too. There may be differences, but not sure 
it's worth it to make us do something on our own.

So I've maybe changed my mind again and would favour the link grammar

Right now I'm just waiting for garima to reply with some analysis. He was 
going to read the paper.

Boemann



More information about the calligra-devel mailing list