calligra-devel Digest, Vol 19, Issue 67

matus.uzak at gmail.com matus.uzak at gmail.com
Mon May 14 11:29:01 BST 2012


Hi,

I don't think that a grammar checker based entirely on a Bayes
classifier is logically sound.

Simplified:

In order to detect textual spam, the Bayes classifier is first trained
on examples of spam (training set).
The classifier quality depends on the training set being
representative enough, the textual data representation (input to the
classifier)
and parameters of the training algm.  The trained classifier is then a
set S of (mean value, variance) pairs in input space which represent
known spam.
If a previously unknown input falls into the variance range of any of
the members of S, then it's labeled as spam.

A grammar checker should have the language grammar represented
exactly, by a formal grammar usually.  Again a feasible representation
of the textual data is required. Then you check if a sentence can be
generated by the formal grammar.  The answer is in {yes, not}.

Lightproof seems to be rule based. And rule based systems have strong
maintainability drawbacks.

A combination of a rule based system with Bayes sounds promising. That
would enable something like context based grammar checking.

br,

-matus uzak

On Sat, May 12, 2012 at 9:33 PM, Garima Joshi <gjoshi0311 at gmail.com> wrote:
> i could not get the concept of bayesian spam filter idea. can u explain it
>
> On Sat, May 12, 2012 at 6:48 AM, <calligra-devel-request at kde.org> wrote:
>>
>> Send calligra-devel mailing list submissions to
>>        calligra-devel at kde.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        https://mail.kde.org/mailman/listinfo/calligra-devel
>> or, via email, send a message with subject or body 'help' to
>>        calligra-devel-request at kde.org
>>
>> You can reach the person managing the list at
>>        calligra-devel-owner at kde.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of calligra-devel digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: Review Request: Make Paragraph Format dialog use
>>      KoStyleThumbnailer (Commit Hook)
>>   2. Re: Review Request: Load and save contour-polygon, and
>>      wrapping around it in Words (Thorsten Zachmann)
>>   3. Re: Grammar checker in CalligraWords. (C. Boemann)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sat, 12 May 2012 02:14:17 -0000
>> From: "Commit Hook" <null at kde.org>
>> To: "Calligra" <calligra-devel at kde.org>, "Gopalakrishna Bhat"
>>        <gopalakbhat at gmail.com>, "Commit Hook" <null at kde.org>
>> Subject: Re: Review Request: Make Paragraph Format dialog use
>>        KoStyleThumbnailer
>> Message-ID: <20120512021417.9496.55517 at vidsolbach.de>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> http://git.reviewboard.kde.org/r/104919/#review13736
>> -----------------------------------------------------------
>>
>>
>> This review has been submitted with commit
>> 48f0de85e831aaa22f7475ba54a84a76d4201fbd by Gopalakrishna Bhat A to branch
>> master.
>>
>> - Commit Hook
>>
>>
>> On May 11, 2012, 6:57 p.m., Gopalakrishna Bhat wrote:
>> >
>> > -----------------------------------------------------------
>> > This is an automatically generated e-mail. To reply, visit:
>> > http://git.reviewboard.kde.org/r/104919/
>> > -----------------------------------------------------------
>> >
>> > (Updated May 11, 2012, 6:57 p.m.)
>> >
>> >
>> > Review request for Calligra.
>> >
>> >
>> > Description
>> > -------
>> >
>> > This change enables us to use the actual textlayout code to render
>> > previews. This approach also enables us to remove a lot of redundant code
>> > that was there to substitute the layout process.
>> >
>> >
>> > Diffs
>> > -----
>> >
>> >   libs/textlayout/KoStyleThumbnailer.h 42c3c6d
>> >   libs/textlayout/KoStyleThumbnailer.cpp acb836d
>> >   plugins/textshape/dialogs/CharacterGeneral.h bb72b88
>> >   plugins/textshape/dialogs/CharacterGeneral.cpp 4923087
>> >   plugins/textshape/dialogs/FormattingPreview.h 5a67d48
>> >   plugins/textshape/dialogs/FormattingPreview.cpp 6103699
>> >   plugins/textshape/dialogs/ParagraphBulletsNumbers.h 03c5cc9
>> >   plugins/textshape/dialogs/ParagraphBulletsNumbers.cpp 2d365f8
>> >   plugins/textshape/dialogs/ParagraphDecorations.h 0eea7cc
>> >   plugins/textshape/dialogs/ParagraphDecorations.cpp b5c4d94
>> >   plugins/textshape/dialogs/ParagraphDropCaps.h 7c7b071
>> >   plugins/textshape/dialogs/ParagraphDropCaps.cpp 6696254
>> >   plugins/textshape/dialogs/ParagraphGeneral.h 5ef5c86
>> >   plugins/textshape/dialogs/ParagraphGeneral.cpp 8ef5b7f
>> >   plugins/textshape/dialogs/ParagraphIndentSpacing.h 27533e0
>> >   plugins/textshape/dialogs/ParagraphIndentSpacing.cpp 5227d55
>> >   plugins/textshape/dialogs/ParagraphLayout.h be272ee
>> >   plugins/textshape/dialogs/ParagraphLayout.cpp 7b46f00
>> >
>> > Diff: http://git.reviewboard.kde.org/r/104919/diff/
>> >
>> >
>> > Testing
>> > -------
>> >
>> > Tested manually that changing values in the dialog reflects in the
>> > preview.
>> >
>> >
>> > Thanks,
>> >
>> > Gopalakrishna Bhat
>> >
>> >
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <http://mail.kde.org/pipermail/calligra-devel/attachments/20120512/39d6a592/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sat, 12 May 2012 04:35:01 -0000
>> From: "Thorsten Zachmann" <t.zachmann at zagge.de>
>> To: "C. Boemann" <cbr at boemann.dk>, "Thorsten Zachmann"
>>        <t.zachmann at zagge.de>,  "Calligra" <calligra-devel at kde.org>, "Matus
>>        Uzak" <matus.uzak at ixonos.com>
>> Subject: Re: Review Request: Load and save contour-polygon, and
>>        wrapping around it in Words
>> Message-ID: <20120512043501.16970.8689 at vidsolbach.de>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> http://git.reviewboard.kde.org/r/104873/#review13737
>> -----------------------------------------------------------
>>
>>
>> The odf that the draw:contour saves is invalid. The draw:points contains
>> double values but only int is allowed. I guess you can multiply the values
>> by 1000 and also store a the bigger viewBox to reflect that change. Another
>> idea whould be to drop saving of contour-polygon completely and only support
>> saving of contour-path.
>>
>>
>> libs/flake/KoShape.cpp
>> <http://git.reviewboard.kde.org/r/104873/#comment10910>
>>
>>    This line can be removed.
>>
>>
>>
>> libs/flake/KoShape.cpp
>> <http://git.reviewboard.kde.org/r/104873/#comment10911>
>>
>>    This will result in invalid odf if there is more then one
>> clipPathShape. This can be created e.g. in karbon and saving it to odg.
>>
>>
>> - Thorsten Zachmann
>>
>>
>> On May 6, 2012, 5:35 p.m., C. Boemann wrote:
>> >
>> > -----------------------------------------------------------
>> > This is an automatically generated e-mail. To reply, visit:
>> > http://git.reviewboard.kde.org/r/104873/
>> > -----------------------------------------------------------
>> >
>> > (Updated May 6, 2012, 5:35 p.m.)
>> >
>> >
>> > Review request for Calligra.
>> >
>> >
>> > Description
>> > -------
>> >
>> > Load and save the contour-polygon and contour-path elements.
>> > They are stored as KoClipPath in the engine. Previously the engine only
>> > had support for loading and saving KoClipPath in svg
>> >
>> > Load and save the style frame style attributes that turn contoured text
>> > run around
>> > (wrapping) on and off. wrap-contour and wrap-contour-mode
>> >
>> > Use this new technology to do tight run around (simple change only
>> > requesting what we already had)
>> >
>> > This commit also adds a method to pictureshape to create a tightfitting
>> > outline path of the image.
>> > It's tested and works, but is not used yet to create a KoClipPath out of
>> > it.
>> >
>> >
>> > Diffs
>> > -----
>> >
>> >   libs/flake/KoPathShape.h d6e7cf1
>> >   libs/flake/KoPathShape.cpp 1196a0b
>> >   libs/flake/KoShape.h 3b2ee21
>> >   libs/flake/KoShape.cpp 40c2be2
>> >   libs/flake/KoShape_p.h a8ef367
>> >   libs/textlayout/KoTextLayoutObstruction.cpp 9711578
>> >   plugins/pictureshape/PictureShape.h 54cc813
>> >   plugins/pictureshape/PictureShape.cpp e7fcd39
>> >
>> > Diff: http://git.reviewboard.kde.org/r/104873/diff/
>> >
>> >
>> > Testing
>> > -------
>> >
>> > I've created such contour clipping in LO
>> >
>> > loaded and seen it work in Words, and saved it back to LO where it works
>> > (though we need to work around a bug in LO, workaround is there but not
>> > enabled in this diff)
>> >
>> > For the unused method in the pictre shape I had hardwired it during
>> > development. I will put up a new review when I actually use the method so
>> > you may ignore it for now.
>> >
>> >
>> > Thanks,
>> >
>> > C. Boemann
>> >
>> >
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <http://mail.kde.org/pipermail/calligra-devel/attachments/20120512/7601f00a/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Sat, 12 May 2012 12:47:57 +0200
>> From: "C. Boemann" <cbo at boemann.dk>
>> To: Calligra Suite developers and users mailing list
>>        <calligra-devel at kde.org>
>> Subject: Re: Grammar checker in CalligraWords.
>> Message-ID: <201205121247.57248.cbo at boemann.dk>
>> Content-Type: Text/Plain;  charset="us-ascii"
>>
>> On Friday 11 May 2012 19:46:03 Garima Joshi wrote:
>> > Hi,
>> > Here are some ideas based on my research done regarding available
>> > libraries
>> > for grammar check in calligra words. We need to implement a grammar
>> > checking plugin named grammarcheck inside calligra/plugins/textediting.
>> > This plugin will be based on the spell check plugin and will have some
>> > code, for example, the text highlighting code in common.
>> >
>> > One option is grammarcheck plugin will use the link-grammar library
>> > already
>> > used by AbiWord, which provides an API to parse sentences, tokenize
>> > them,
>> > and provide linkages as result. This library has been customized by
>> > AbiWord
>> > to serve the purpose of grammar checking in documents. This is a link to
>> > the project
>> > http://www.abisource.com/projects/link-grammar/
>> > Here is some documentation for the API
>> > http://www.abisource.com/projects/link-grammar/api/index.html
>> >
>> > This documentation, along with the AbiWord source code itself (the part
>> > that integrates the link-grammar parser and checker), can serve as a
>> > good
>> > example as to how to integrate the library in our grammarcheck plugin.
>> >
>> > http://svn.abisource.com/abiword/trunk/plugins/grammar/linkgrammarwrap/
>> > http://svn.abisource.com/abiword/trunk/plugins/grammar/xp/
>> >
>> > Another option is LanguageTool can be used to implement grammarcheck.
>> > http://www.languagetool.org/ <http://www.languagetool.org/usage/>
>> > It is already used as a plugin to OpenOffice.org and LibreOffice.
>> > We will write a wrapper in order to use LanguageTool.
>> > http://www.languagetool.org/development/api/
>> > http://wiki.services.openoffice.org/wiki/Grammar_Checking
>> >
>> > http://cgit.freedesktop.org/libreoffice/core/languagetool<http://cgit.freed
>> > esktop.org/libreoffice/core/tree/languagetool>
>> >
>> > The implementation details of this proposal will be more detailed as
>> > I investigate the source codes(mentioned in the above links) further.
>> >
>> > The plugin will function on the basis of a pre-supplied dictionary.
>> > On the usability part, the plugin can be turned on/off at will of the
>> > user,
>> > and different grammar mistakes which are highlighted can be chosen to be
>> > ignored (once, or always). Also, there maybe an option to auto detect
>> > language context, that will understand if the language currently being
>> > written in is English, and only then turn on grammar checking
>> > accordingly.
>> >
>> > I need suggestions on which library to use for grammar check support in
>> > calligra words. Above are the two possible options which i prefer. Any
>> > other library or any suggestions are welcome.
>> Hi Garima
>>
>> This serves as a very good introduction. We need however to learn more
>> about
>> each of those alternative work before we can make a decision. and not just
>> from a code point of view but also in how good a job it does.
>>
>> Also you seem to have missed that libreoffice 3.5 introduced a new tool
>> based on
>> lightproof
>> http://libreoffice.hu/2011/12/08/grammar-checking-in-libreoffice/
>>
>> Another avenue worth investigating is an idea i just got. Why not do
>> something
>> like bayesian spam filters do. Learn from known goo grammar. And let the
>> user
>> allow the filter checker to learn more. With get hot new stuff so you can
>> download  other languages as users make them. This would be a totally new
>> way
>> of doing it. I'm willing to bet on this. If it doesn't work out well fine,
>> if
>> it does then great. Just let us make sure that the teextediting
>> grammarplugin
>> is not too tied in to what ever backend we do. Then we can always change
>> our
>> mind later.
>>
>> And doing some frontier work will be really fun.
>>
>> Boemann
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> calligra-devel mailing list
>> calligra-devel at kde.org
>> https://mail.kde.org/mailman/listinfo/calligra-devel
>>
>>
>> End of calligra-devel Digest, Vol 19, Issue 67
>> **********************************************
>
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel at kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
>



More information about the calligra-devel mailing list