[Bug 148211] New: Anti-Spam Wizard's creates problematic filtersfor bogofilter

Ingomar Wesp ingomar at wesp.name
Wed Jul 25 21:53:08 BST 2007


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
         
http://bugs.kde.org/show_bug.cgi?id=148211         
           Summary: Anti-Spam Wizard's creates problematic filtersfor
                    bogofilter
           Product: kmail
           Version: unspecified
          Platform: Ubuntu Packages
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: general
        AssignedTo: kdepim-bugs kde org
        ReportedBy: ingomar wesp name


Version:            (using KDE KDE 3.5.7)
Installed from:    Ubuntu Packages
OS:                Linux

The filter setup created by KMail's "Anti-Spam Wizard" for bogofilter might 
lead to severe problems with bogofilter's wordlist when the user applies 
either "Classify as SPAM" or "Classify as NOT SPAM" on messages that have not 
been (automatically) registered before.

Since bogofilter's auto-register option "-u" only registers messages that it 
can automatically classify as spam or ham, using "bogofilter -N -s" for 
manually registering messages as spam and "bogofilter -n -S" for manually 
registering messages as HAM leads to a decrement of all spam- or ham-counts 
for all tokens contained in the processed message as well as a decrement of 
the spam- or ham-counts in the special token ".MSG_COUNT". If used on messages that have not been registered before, this may lead to a condition where the spam (or ham-) count of tokens exceed the spam (or ham-) message count, which in turn will produce odd results in the individual spam- or ham-propabilities for the 
affected tokens. In extreme cases (spam-value for ".MSG_COUNT" is 0), 
bogofilter will produce a spam probability of "nan" because of a floating 
point division by zero.

Since it is generally not a very good idea to unregister messages that have 
not been registered before, I would suggest to change the generated filters 
into a setup that refrains from auto-registration and manual 
de-registrations.

As suggested by Matthias Andree on the bogofilter mailing list, I would like 
to propose replacing the current filter setup …

+----------------------+----------------------------------------+-------+
| Filter name          | Action                                 | Auto? |
+----------------------+----------------------------------------+-------+
| Bogofilter Check     | Pipe through    "bogofilter -p -e -u"  | Yes   |
| Classify as SPAM     | Execute command "bogofilter -N -s"     | No    |
| Classify as NOT SPAM | Execute command "bogofilter -S -n"     | No    |
+----------------------+----------------------------------------+-------+

… with something like this:

+----------------------+----------------------------------------+-------+
| Filter name          | Action                                 | Auto? |
+----------------------+----------------------------------------+-------+
| Bogofilter Check     | Pipe through    "bogofilter -p -e "    | Yes   |
| Classify as SPAM     | Execute command "bogofilter -s"        | No    |
| Classify as NOT SPAM | Execute command "bogofilter -n"        | No    |
+----------------------+----------------------------------------+-------+

Although SPAM and HAM messages that are correctly classified by bogofilter
are not automatically added to the wordlist, this filter setup works pretty 
well on my system, relying only on the occasional manual classifications.

Not only does it avoid the problems mentioned above, but it also results in a 
massive performance increase when checking messages, since no write access to 
the wordlist is required.



More information about the Kdepim-bugs mailing list