[Kde-pim] CRM114 antispam score display (was: Re: branches/KDE/3.5/kdepim/kmail)

Martin Steigerwald Martin at lichtvoll.de
Sun Jul 8 18:07:56 BST 2007


Am Sonntag 03 Juni 2007 schrieb Ingo Klöcker:

Hi Ingo, Andreas, KMail and KDEPIM developers,

> On Saturday 02 June 2007 01:18, Martin Steigerwald wrote:
[...]
> > Am Montag 28 Mai 2007 schrieb Ingo Klöcker:
[...]
> > Headers look like this:
> >
> > ---------------------------------------------------------------------
> > martin at shambala:~/Mail> grep -ir "X-CRM114-Status:" * | cut -d":"
> > -f3,4 | grep SPAM
> > X-CRM114-Status: SPAM  ( -43.62  )
> > X-CRM114-Status: SPAM  ( -17.78  )
> >  X-CRM114-Status: SPAM  ( -61.96  )
> > X-CRM114-Status: SPAM  ( -15.03  )
> >
> > martin at shambala:~/Mail> grep -ir "X-CRM114-Status:" * | cut -d":"
> > -f3,4 | grep GOOD | head -10
> > X-CRM114-Status: GOOD (  11.09  )
> > X-CRM114-Status: GOOD ( 304.35  )
> > X-CRM114-Status: GOOD (  81.34  )
[...]
> > martin at shambala:~/Mail> grep -ir "X-CRM114-Status:" * | cut -d":"
> > -f3,4 | grep UNSURE | head -10
> > X-CRM114-Status: UNSURE (  -1.80  )
> > X-CRM114-Status: UNSURE (   3.46  )
> > X-CRM114-Status: UNSURE (   3.68  )
> > X-CRM114-Status: UNSURE (   9.94  )
[...]

> It seems we have to introduce yet another score type since with CRM114
> spam has large negative scores while ham has large positive scores.

Well yes. Maybe something general where you can specify the complete score 
range and the necessary thresholds would be suitable.

ScoreRange=-400,400
ScoreUnsureThreshold=-10
ScoreGoodTreshold=10

Or just one range for each of those?

> > From what I understand I need to know the exact treshold on that
> > CRM114 classifies a mail as SPAM at least?
>
> Yes.

I will ask on the crm114-general mailinglist for that one. CRM114 does not 
seem to specify the treshold in its headers and depending on the 
classifier one uses the tresholds may vary. Maybe it would be good if 
CRM114 puts thresholds for SPAM and UNSURE into the headers somehow.

> > Ingo, Andreas what about mails classified as UNSURE? Does spam score
> > display in KMail support those?
>
> Well, I guess for scores corresponding to UNSURE the color bar should
> be partially filled. For ham the color bar should be empty and for spam
> it should be completely filled.

Actually I do not quite understand the spam score display completely...

> > I have holidays in the next two weeks, I will be with limited
> > internet access next week but after that really like to take the time
> > to look into trying to bring together suitable KMail spam score
> > display configuration statements for KMail to finally complete the
> > CRM114 configuration for KMail...

... well after facing the difference of theory and experience I managed to 
do at least a minimal spam score display for CRM114. I just put in a 
boolean filter for now[1]:
   
ScoreName=CRM114
ScoreHeader=X-CRM114-Status
ScoreType=Bool
ScoreValueRegexp=SPAM
ScoreThresholdRegexp=

But as far as I understand thats the best that works out of the box for 
now. At least KMail makes a difference between spam and ham/unsure mails 
in the spam score display.

But I do not yet get that: When I mail is spam I get a color gradient from 
green over yellow to red displayed. Is that correct behaviour? I wonder 
why there is green in there after all when its a spam. When a mail is 
unsure or ham I get a blank box. I would have expected something green 
here ;-). For some mails that were flagged by SpamAssassin I got a 
partially filled box with a partial color gradient, for example the 
gradient up to yellow. I would have interepreted this as UNSURE.

So how does this actually work? Maybe it should be rethought a bit, I do 
not think its very intuitive. I would use the following:

- a red (SPAM) / yellow (UNSURE) / green (HAM) box for a boolean / 
triplean ;-)

- a red (SPAM) / yellow (UNSURE) / green (HAM) bar that displays the 
amount of spamicity, unsurecity or hamicity. Hmmm, but this might be 
confusing as well. Need to think about this a bit more.

Anyway, to support unsure mails in the spam score displays some C++ code 
needs to be touched. As well as for supporting a new score type for the 
CRM114 score range. I did not yet dig into this. My last C programming 
experience is years ago, and that wasn't C++ altough it was using an 
object orientated GUI framework nonetheless. Well let's see. If I manage 
to take some more time for this, I will have a look at the source code of 
the antispam stuff and look whether I can make a sense of it.

If someone wants to help with the C++ part, I gadly appreciate it. And if 
I have questions when looking at the source I will find someone to ask 
those ;-).

[1] 
http://websvn.kde.org/trunk/KDE/kdepim/kmail/kmail.antispamrc?r1=675120&r2=685300

Regards,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20070708/1b365a7e/attachment.sig>
-------------- next part --------------
_______________________________________________
kde-pim mailing list
kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
kde-pim home page at http://pim.kde.org/


More information about the kde-pim mailing list