[kdepim-users] Filter on language

ianseeks ianseeks at dsl.pipex.com
Mon Nov 19 16:52:53 GMT 2012


On Saturday 17 Nov 2012 20:16:03 Henk van Velden wrote:
> On Saturday 17 November 2012 19:48:32 Martin Steigerwald wrote:
> > Am Donnerstag, 15. November 2012 schrieb ianseeks:
> > > Hi
> > 
> > Hi Ian,
> > 
> > > I'm getting loads of spam from a japanese sites and i'm now bored of
> > > updating my junk filters every day .
> > > 
> > > Is there a way i can filter out emails that are using asian language
> > > fonts?
> > 
> > I am not aware of something like this. But the encoding might be in the
> > mail headers that you can view with the V key. You can filter for anything
> > in there. Maybe there is also something else. Hmmm, I scanned some of my
> > using foreign charsets spams that CRM114 has sorted my into spam folder
> > and they do not seem to have any helpful headers.
> > 
> > Thus I can only imagine running it through an external program that
> > detects encoding, or a small script calling such a program and then
> > decides whether spam or not.
> > 
> > Anyway, I recommend something more generic – at least if you are running
> > your own mail server: policyd-weight. It removes most spam at SMTP level
> > by some tests and asking a set of blacklists.
> > 
> > On the client I suggest CRM114. I wrote an article on how to integrate,
> > but did not test this with KDEPIM 2 already. Tell me if you are interested
> > and I see if this article has been translated to english and possibly
> > provide a link.
> > 
> > Whats the advantage of CRM114 or another self-learning spam filter? You do
> > not have to create your own spam filter rules every day.
> > 
> > From the tons of spam to my mail address each day, I only see 0-10 in
> > unsure folder. There are more in the local spam folder, but I only scan
> > subject lines quickly to make sure CRM114 had no false positives, which it
> > didn´t recently.
> > 
> > In fact, policyd-weight and CRM114 make it possible to actually read my
> > mail. Otherwise I would have to search it in a sea of spam first.
> > 
> > CRM114 could be used client side, even stand alone. I use it client side,
> > but still with POP3. Heck, this works so fine and I only ever read my mail
> > on this laptop, that I might continue using POP3.
> > 
> > Both need some time I get the concepts and set them up, but IMHO its
> > really worth it. I have no single hand crafted spam filter rule at all. So
> > I do not have to do anything except for give CRM114 a little training when
> > the next spam wave comes from somewhere else than Japan. Actually I hardly
> > ever notice any spam waves. CRM114 learns quickly, efficiently and also
> > forgets as needed. All with just two about 12 MiB sized mmap()ed files.
> > 
> > And this setup works for years already. Without any major changes.
> > 
> > With everybody and every provider doing this, there would probably not be
> > a market for spammers anymore. Thats the hope of some CRM114 developers.
> > 
> > That said, there may be other spam filters being that efficient, like
> > dspam
> > or newer spamassassin versions that I think the Zimbra at work uses.
> > CRM114 isn´t even only a spam filter, it can classify any texts.
> > 
> > Ciao,
> 
> As the only character encoding that is (going to be) used by modern systems
> is UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859
> versions) is not going to work.
> 
> @ianseeks: though I understand what you want, talking about Asian fonts will
> not help in understanding the problem. A font is only a small picture
> generated to make visable to a human being what a character code is. It is
> about the Unicode codes you get in the mail. As these are in groups (e.g.
> Japanese Hiragana is 3040 - 309F) a filter could decide that many
> characters within specific ranges in a mail could rate that mail as spam.
> But (as all spam filtering) it is no exact science.

Thanks for the explanation. I was wondering if there was a common word used in 
Japanese (like "the" in English) i could search for. I get text like this "ないの
で連.絡先を送りますね〓お互いのアドレスや番号で直接連 絡を取り合いませんか?〓 " and then a web address.  I've got 3 
filters set up with the parts of the web address which worked for a while but 
now they are changing the web address regularly .
regards

Ian 
_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users


More information about the kdepim-users mailing list