spamassassin/user_prefs

Nigel Wade nmw at ion.le.ac.uk
Mon Mar 22 15:12:42 UTC 2004


Charles Howse wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi,
> 
> While reading another thread, I remembered I had no custom preferences for 
> spamassassin, and decided to create some.
> 
> I use the default settings for starting spamassassin at boot, and the 
> following filters in KMail:
> 1. In KMail menus, select Settings->Configure Filters
> 2. Create a new filter with filter criteria:
>     <any header> matches regular expression .
>     (the regular expression is just the character "." meaning 
>     "any character")
>     and filter action:
>     pipe through spamc
>     Uncheck the box "stop processing if this filter matches"
> 3. Add a second filter below the one created in step 2, with criteria:
>     <any header> contains X-Spam-Flag: YES
>     and action:
>     move to folder trash
>     (or whatever you want to do with your spam)
>     check the "stop processing..." box
> 
> These filters are working fine, with the exception of those html spams with 
> all the random words in the body when viewed in text mode.
> 
> I was just wondering if anyone would like to share some _generic_ preferences 
> for ~/.spamassassin/user_prefs, or comment.


The way to catch those is with Bayesian filtering. You need to teach the 
Bayesian filter with sufficient messages so that it learns what is spam and 
what is not (at least 1000 of each is a good rule of thumb for best accuracy).

The random words don't have a significant effect on the Bayesian scoring as 
it uses words which are the most like spam and least like spam to determine 
the overall "spaminess" of the message.

When it's trained well it's very good. I recently installed SA on our mail 
server, trained with about 5000 spam and 3000 ham. Out of the last 3000+ 
messages I've received I've got no false positives and it's only failed to 
identify 2 spams.

But if you are going to do spam filtering in the mail client, why not use 
Mozilla/Firebird? It has Bayesian filtering built in, and it's pretty good 
once it's been taught. It's much easier to teach than SA for a single user - 
a single mouse click is all that's required for each message.

-- 
Nigel Wade





More information about the fedora-list mailing list