[antispam-f] Two-word spam

  • From: Harriet Bazley <lists@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
  • To: antispam@xxxxxxxxxxxxx
  • Date: Wed, 04 Apr 2007 01:40:54 +0100

I was receiving a lot of stock market spams downloaded as headers-only
(on the grounds that the From and Reply-To were the same address. which
is often but not always an indicator of spam:   probably indicates
nothing more definite than the defaults on a certain e-mail client...)
which all looked the same, i.e. the start of the body text was very
similar.   However, I wasn't getting anywhere trying to block them on
the first line of the body since they kept changing this day by day.


In the end, I noticed that the Message-Ids were somewhat distinctive,
obviously dictionary-generated:   the 'domain' consists of a valid
dictionary word prefixed and suffixed by random single letters, e.g. 
<09bb01c77506$3cff67b0$2710c5b0@pgeminatep>
<083301c77507$3ebc99c0$4826c9d0@kliftedc>
<03ff01c77404$3eda10e0$3429d3d0@aethnica>
<01dd01c77301$3eeb22c0$0165c5c0@umachinerye>

Unfortunately, while this 'footprint' is easy to spot by the human eye,
it isn't very easy to write a rule to block, and I was getting tired of
manually marking them for deletion.   The easy thing to check for is a
Message-Id which has no full stops after the '@', i.e. not a full
domain, but that isn't safe either: '@localhost' is the obvious case,
but I came across other valid examples.

So in the end, I've been checking for three things:   a single-word
domain, a Subject header consisting of "Re:" plus just two words (again,
obviously dictionary-generated to the naked eye), and - just to make
sure - the old 'twodollar' check, inspecting characters 14, 23 and 32
for matches with '$', '$' and '@' respectively.   (Again, this
particular trait isn't reliable enough to use as a check on its own, but
it is... suggestive in combination with other Rules)


Delete Message-Id: @ twodollars
And Message-Id: @ nodomain
And Subject: @ twowords

This seems to be picking out these spams and only these spams to date;
obviously, the two UserTests operating on the Message-Id could be
combined, but since I use the @twodollars test elsewhere I haven't
attempted it.

~~~~~~~~~~~~~~~



DEF FN@twodollars
IF MID$(d$,14,1)="$" AND MID$(d$,23,1)="$" AND MID$(d$,32,1)="@" THEN=TRUE
=0


DEF FN@nodomain
LOCAL ptr%
ptr%=INSTR(d$,"@")
IF INSTR(d$,".",ptr%)=0 THEN =TRUE
REM check for Message-Ids in the form <080c01c77005$3eea39b0$0502b4d0@lvacuumk>
=0

DEF FN@twowords
LOCAL subject$,space%
subject$=d$
IF LEFT$(subject$,4)<>"re: " THEN =0
REM these spams always start 'Re'
space%=5
space%=INSTR(subject$," ",space%)
IF space%=0 THEN=0
REM find space after first word
space%=INSTR(subject$," ",space%+1)
IF space%=0 THEN =TRUE
REM if there are only two words, there should be no more spaces left
=FALSE


-- 
H. Bazley

Everyone is entitled to one really BIG mistake

Other related posts: