What's with bogofilter and spam

Anne Wilson cannewilson at googlemail.com
Thu Jun 4 12:31:32 UTC 2009


On Thursday 04 June 2009 13:21:19 Jonathan Kamens wrote:
> On 06/04/2009 07:53 AM, Rodd Clarkson wrote:
> > Sadly, I'm not feeling like being manual about this, and I guess that I
> > just expect my mail client to work well with the spam software and do it
> > for me.  After all, my mail client has a great collection of ham and
> > spam so if I can do something like it manually, then surely it can't be
> > hard for the spam software to do it without me having to thing about it.
> >
> > bogofilter used to work well, and I'm hoping that it can once again be
> > the great spam filter it was, fast and accurate.
> >    
> I can't speak specifically to the integration of bogofilter into 
> evolution, since I use a completely home-grown bogofilter integration, 
> which, as shown here <http://stuff.mit.edu/%7Ejik/#spam>, successfully 
> blocks thousands of spam messages and viruses per day.
> 
> However, I do want to reiterate what Anne said.  She's right that the 
> spammers are getting smarter.  They're /always/ getting smarter -- it's 
> a constant battle for the anti-spammers to keep up with the new ideas 
> that the spammers come up with.  Therefore, what worked well enough is 
> no longer good enough.
> 
> For bogofilter to be most effective, here's what needs to happen:
> 
>    1. Incoming email needs to be divided into three categories -- ham,
>       spam, and unsure -- not just into ham and spam.
>    2. Ham and spam messages needed to be added to the bogofilter
>       database automatically after they are categorized.
>    3. Unsure messages need to be categorized by the user and then added
>       to the bogofilter database as either ham or spam, depending on the
>       user's categorization.
>    4. Incorrectly classified messages need to be reclassified when they
>       are detected, e.g., a spam message incorrectly classified as ham
>       needs to first be removed from the database as ham and then added
>       to the database as spam.
>    5. Bogofilter needs to be tuned periodically using a large collection
>       of known-ham and known-spam messages.
>    6. The bogofilter database needs to be pruned periodically, i.e.,
>       words that haven't been seen in any incoming email in a while (I
>       personally use 180 days as my threshold) need to be removed,
>       preferably before tuning.
> 
> All of these are important, but the first four are by far the most 
> important.  If the evolution integration doesn't use tristate 
> classification, or if it doesn't make it easy for you to identify and 
> classify unsure messages and reclassify incorrectly classified ones, 
> then it is inevitable that over time, bogofilter's ability to detect 
> spam will degrade.
> 
For those who create their own imap server and use procmail, the following 
tells how to set up bogofilter to work within procmail, automatically 
filtering into ham, spam and unsure folders.  It makes life simple :-)

http://userbase.kde.org/KMail/FAQs_Hints_and_Tips#Spam_filtering_on_an_IMAP_server

Anne
-- 
New to KDE4? - get help from http://userbase.kde.org
Just found a cool new feature?  Add it to UserBase
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/fedora-test-list/attachments/20090604/c04dcb5a/attachment.sig>


More information about the fedora-test-list mailing list