[kdepim-users] Filter on language
Henk van Velden
henk.vanvelden at xs4all.nl
Sat Nov 17 19:16:03 GMT 2012
On Saturday 17 November 2012 19:48:32 Martin Steigerwald wrote:
> Am Donnerstag, 15. November 2012 schrieb ianseeks:
> > Hi
>
> Hi Ian,
>
> > I'm getting loads of spam from a japanese sites and i'm now bored of
> > updating my junk filters every day .
> >
> > Is there a way i can filter out emails that are using asian language
> > fonts?
>
> I am not aware of something like this. But the encoding might be in the
> mail headers that you can view with the V key. You can filter for anything
> in there. Maybe there is also something else. Hmmm, I scanned some of my
> using foreign charsets spams that CRM114 has sorted my into spam folder
> and they do not seem to have any helpful headers.
>
> Thus I can only imagine running it through an external program that
> detects encoding, or a small script calling such a program and then
> decides whether spam or not.
>
> Anyway, I recommend something more generic – at least if you are running
> your own mail server: policyd-weight. It removes most spam at SMTP level
> by some tests and asking a set of blacklists.
>
> On the client I suggest CRM114. I wrote an article on how to integrate,
> but did not test this with KDEPIM 2 already. Tell me if you are interested
> and I see if this article has been translated to english and possibly
> provide a link.
>
> Whats the advantage of CRM114 or another self-learning spam filter? You do
> not have to create your own spam filter rules every day.
>
> From the tons of spam to my mail address each day, I only see 0-10 in
> unsure folder. There are more in the local spam folder, but I only scan
> subject lines quickly to make sure CRM114 had no false positives, which it
> didn´t recently.
>
> In fact, policyd-weight and CRM114 make it possible to actually read my
> mail. Otherwise I would have to search it in a sea of spam first.
>
> CRM114 could be used client side, even stand alone. I use it client side,
> but still with POP3. Heck, this works so fine and I only ever read my mail
> on this laptop, that I might continue using POP3.
>
> Both need some time I get the concepts and set them up, but IMHO its
> really worth it. I have no single hand crafted spam filter rule at all. So
> I do not have to do anything except for give CRM114 a little training when
> the next spam wave comes from somewhere else than Japan. Actually I hardly
> ever notice any spam waves. CRM114 learns quickly, efficiently and also
> forgets as needed. All with just two about 12 MiB sized mmap()ed files.
>
> And this setup works for years already. Without any major changes.
>
> With everybody and every provider doing this, there would probably not be
> a market for spammers anymore. Thats the hope of some CRM114 developers.
>
> That said, there may be other spam filters being that efficient, like dspam
> or newer spamassassin versions that I think the Zimbra at work uses.
> CRM114 isn´t even only a spam filter, it can classify any texts.
>
> Ciao,
As the only character encoding that is (going to be) used by modern systems is
UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859
versions) is not going to work.
@ianseeks: though I understand what you want, talking about Asian fonts will
not help in understanding the problem. A font is only a small picture
generated to make visable to a human being what a character code is. It is
about the Unicode codes you get in the mail. As these are in groups (e.g.
Japanese Hiragana is 3040 - 309F) a filter could decide that many characters
within specific ranges in a mail could rate that mail as spam. But (as all
spam filtering) it is no exact science.
--
Met vriendelijke groet,
Henk van Velden
_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users
More information about the kdepim-users
mailing list