[Bug 148211] New: Anti-Spam Wizard's creates problematic filtersfor bogofilter
Ingomar Wesp
ingomar at wesp.name
Wed Jul 25 21:53:08 BST 2007
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
http://bugs.kde.org/show_bug.cgi?id=148211
Summary: Anti-Spam Wizard's creates problematic filtersfor
bogofilter
Product: kmail
Version: unspecified
Platform: Ubuntu Packages
OS/Version: Linux
Status: UNCONFIRMED
Severity: normal
Priority: NOR
Component: general
AssignedTo: kdepim-bugs kde org
ReportedBy: ingomar wesp name
Version: (using KDE KDE 3.5.7)
Installed from: Ubuntu Packages
OS: Linux
The filter setup created by KMail's "Anti-Spam Wizard" for bogofilter might
lead to severe problems with bogofilter's wordlist when the user applies
either "Classify as SPAM" or "Classify as NOT SPAM" on messages that have not
been (automatically) registered before.
Since bogofilter's auto-register option "-u" only registers messages that it
can automatically classify as spam or ham, using "bogofilter -N -s" for
manually registering messages as spam and "bogofilter -n -S" for manually
registering messages as HAM leads to a decrement of all spam- or ham-counts
for all tokens contained in the processed message as well as a decrement of
the spam- or ham-counts in the special token ".MSG_COUNT". If used on messages that have not been registered before, this may lead to a condition where the spam (or ham-) count of tokens exceed the spam (or ham-) message count, which in turn will produce odd results in the individual spam- or ham-propabilities for the
affected tokens. In extreme cases (spam-value for ".MSG_COUNT" is 0),
bogofilter will produce a spam probability of "nan" because of a floating
point division by zero.
Since it is generally not a very good idea to unregister messages that have
not been registered before, I would suggest to change the generated filters
into a setup that refrains from auto-registration and manual
de-registrations.
As suggested by Matthias Andree on the bogofilter mailing list, I would like
to propose replacing the current filter setup …
+----------------------+----------------------------------------+-------+
| Filter name | Action | Auto? |
+----------------------+----------------------------------------+-------+
| Bogofilter Check | Pipe through "bogofilter -p -e -u" | Yes |
| Classify as SPAM | Execute command "bogofilter -N -s" | No |
| Classify as NOT SPAM | Execute command "bogofilter -S -n" | No |
+----------------------+----------------------------------------+-------+
… with something like this:
+----------------------+----------------------------------------+-------+
| Filter name | Action | Auto? |
+----------------------+----------------------------------------+-------+
| Bogofilter Check | Pipe through "bogofilter -p -e " | Yes |
| Classify as SPAM | Execute command "bogofilter -s" | No |
| Classify as NOT SPAM | Execute command "bogofilter -n" | No |
+----------------------+----------------------------------------+-------+
Although SPAM and HAM messages that are correctly classified by bogofilter
are not automatically added to the wordlist, this filter setup works pretty
well on my system, relying only on the occasional manual classifications.
Not only does it avoid the problems mentioned above, but it also results in a
massive performance increase when checking messages, since no write access to
the wordlist is required.
More information about the Kdepim-bugs
mailing list