Spam on the Rise. Any new tips?

Thu Sep 7 00:21:21 UTC 2006

From: "Gene Heskett" <gene.heskett at verizon.net>

> On Wednesday 06 September 2006 03:27, jdow wrote:
>>From: "Thom Paine" <painethom at gmail.com>
>>
>>> I've been noticing that the config I had been using for aboutthe past
>>> year is slowly becoming less effective against spam.
>>>
>>> I'm currently using half a dozen or so bl's along with spam assassin.
>>
>>SARE is a wonderful rule source. http://www.rulesemporium.com/ is your
>>friend. Go to the "rules" page. Read the description of the rule sets
>>carefully, and select those which meet your needs.
>>
>>If you are REALLY desperate, have a poorly trained BAYES, and have
>>a lot of image spam you MIGHT check out the spamassassin-users list
>>archives for "FuzzyOCR". It is HIGHLY experimental and more than
>>moderately effective at this point.
>>
>>http://www.rulesemporium.com/programs/sa-stats.txt is a good script
>>for assessing the effectiveness of your various rules. A well trained
>>BAYES will leave you with BAYES_99 catching about 60% or more of all
>>spam and 0.04% or so of ham. BAYES_00 will have that pretty much the
>>reverse. If you have that raise the BAYES_99 score until you see it
>>developing false hits or reach a score equal to your threshold. (I
>>use the default of 5 with a LOT of SARE rule sets. I have perhaps
>>a couple spams in a week get trough out of 25000 emails a month. I
>>get virtually no hams get mismarked.)
>>
>>{^_^}
>
> I can pretty much confirm the effectiveness of that, Joanne.  Such a lashup > useing 
> SARE gets 99.99% of the spam, with perhaps 10 falsely id'd hams a
> week, with about the same amount of traffic.  I train sa-learn with those
> messages it miss-fires on whenever it occurs, so its self-healing.

I bet those falsely trained spams are from mailing lists that are
"open" the way the LKML is "open" to outside postings. I have a fix
for it. {^_-} It's nasty and has not been "mass tested" because the
only suitable "mass" of messages to test with are so specialized none
of the people doing the tests can do it.

First off tune the BAYES as above.

Then generate a rule that tests for the mailing lists that have this
problem. I broke this down into two steps due to lists I am on. I
have individual rules for each list that detect by features the list
manager provides. The LKML/FreeBSD rule is thus (mind the wraps):
===8<---
header JD_SENDER_RELAY  Sender=~ 
/(linux-kernel-owner\@vger\.kernel\.org|owner-freebsd-questions\@freebsd\.org|owner-freebsd-stable\@freebsd\.org|fedora-list-bounces\@redhat\.com)/
describe JD_SENDER_RELAY  Good list with Sender header
score JD_SENDER_RELAY     -1.5

===8<---
Note the blanket modest "good guy" scoring. It is important.
===8<--- And SLE was detectable a different way
header   JD_SUBJ_RELAY  Subject=~ /\[SLE\]/
describe JD_SUBJ_RELAY  Good list as bracketed tag
score    JD_SUBJ_RELAY  -1.5
===8<---

Then I put the two together:
meta __JD_RELAY         (JD_SENDER_RELAY || JD_SUBJ_RELAY)

So now __JD_RELAY exists (and gets a default score of 1) but does not
contribute any score to the final results. I'm ready for some BAYES
tweaking rules:
===8<---
meta      JD_LO_BAYES_LKML      ( JD_LO_BAYES && __JD_RELAY )
describe  JD_LO_BAYES_LKML      LKML unlikely spam
score     JD_LO_BAYES_LKML      -1.0

meta      JD_VLO_BAYES_LKML     ( JD_VLO_BAYES && __JD_RELAY )
describe  JD_VLO_BAYES_LKML     LKML very unlikely spam
score     JD_VLO_BAYES_LKML     -3.1

meta      JD_VHI_BAYES_LKML     ( JD_VHI_BAYES && __JD_RELAY )
describe  JD_VHI_BAYES_LKML     LKML very likely spam
score     JD_VHI_BAYES_LKML     2.0

meta      JD_HI_BAYES_LKML      ( JD_HI_BAYES && __JD_RELAY )
describe  JD_HI_BAYES_LKML      LKML likely spam
score     JD_HI_BAYES_LKML      3.8
===8<---
Note that the rules above increase the score of high BAYES scores
and DECREASE the score of low BAYES scores. This has eliminated
false LKML hits except when some bozoid BCCs the LKML. (I need
to modify the LKML detect rule when once every couple weeks starts
to bother me.)

And I have some side rules for annoyances:
===8<---
meta JD_EMPTY_LKML_SUBJ (__JD_RELAY && MISSING_SUBJECT)
describe JD_EMPTY_LKML_SUBJ LKML seems to strip body and subject for spams
score JD_EMPTY_LKML_SUBJ 3.1

meta JD_LKML_NC_SPAM            (__JD_RELAY && JD_LKML_NO_CONTENT)
describe JD_LKML_NC_SPAM        LKML and no content
score JD_LKML_NC_SPAM           3.1
#1.29

meta JD_LKML_EMPTY_ALL          (__JD_RELAY && MISSING_SUBJECT && JD_LKML_NO_CONTENT)
describe JD_LKML_EMPTY_ALL      Empty LKML spam
score   JD_LKML_EMPTY_ALL       6.5
===8<---

{^_^}   Joanne, "Spam? SPAM? I don't need no fscking SPAM!"