[antispam-f] Re: missing headers

  • From: Richard Porter <ricp@xxxxxxxxxxxxxxxx>
  • To: antispam@xxxxxxxxxxxxx
  • Date: Wed, 12 Sep 2007 21:38:36 +0100

On 12 Sep 2007 Jeremy Nicoll - freelists wrote:

> Richard Porter <ricp@xxxxxxxxxxxxxxxx> wrote:

>> On 11 Sep 2007 Jeremy Nicoll - freelists wrote:
>> 
>>> Richard Porter <ricp@xxxxxxxxxxxxxxxx> wrote:
>> 
>>>> This is a strange one. I've been receiving some spam messages along
>>>> the same lines, which were not marked by PlusNet. All got defaulted by
>>>> AntiSpam and marked as Spam by SpamStamp. However the last one had no
>>>> headers except those added by AntiSpam and SpamStamp:
>> 
>>> What about the message immediately before this one in the logs - has
>>> it had this one's headers in its body text?
>> 
>> There were two messages in the batch. The first one was from the
>> messenger list and was entirely normal.

> Entirely normal - it looked entirely normal in the AS log AND when you
> looked at it inside your mail client?  It didn't eg have headers from msg 2
> listed as part of its body, under its own sig?

The preceding message was received normally in Messenger with no 
spurious headers or text.

>> 
>>>> ----------------------------------------------------------------------
>>>> X-AntiSpam-Date: Tue, 11 Sep 2007 00:45:57 +0100
>>>> X-AntiSpam-Action: default
>>>> We are looking for a highly motivated professional, with skill of
>>>> working with people. The position is home-based. We offer a part-time
>>>> position with flexible working hours. And we would be happy to
>>>> consider a full-time job share applicant.
>>>> 
>>>> The right person will have good consultation and interpersonal skills
>>>> and some knowledge of advertising. Candidates must be able to keep on
>>>> focused and motivated when working alone.
>>>> Spam: Yes
>>>> SpamScore: 100.00%
>> 
>>> It's odd that these two added headers are after the body.  I'd have
>>> thought that the code that added these would have looked for the
>>> expected blank line between headers & body, and if it did, why didn't it
>>> add these three or four lines earlier?  Is the gap between "applicant."
>>> and "The right" not actually a blank line?
>> 
>> I think this confirms that the short message was sent by AntiSpam to
>> SpamStamp,

> As far as I know AS doesn't 'send' messages to SpamStamp, rather SpamStamp
> reads the whole file that AS prepared.

I think you know what I mean.

>> which added its own headers at the end as it hadn't
>> detected any headers to put them after.

> You'd need to ask JJvdG just how clever SpamStamp is when it is trying to
> decide where to add its headers.  For example, does it actually look for
> headers at all, or merely for the blank line that should follow them?

>> In the AntiSpam ISP log there was a blank line at the end of the headers.

> What headers?  In the example you posted you showed:

>  ----------------------------------------------------------------------
>  X-AntiSpam-Date: Tue, 11 Sep 2007 00:45:57 +0100
>  X-AntiSpam-Action: default
>  We are looking for a highly motivated professional, with skill of
>  working with people. The position is home-based. We offer a part-time
>  position with flexible working hours. And we would be happy to
>  consider a full-time job share applicant.

>  The right person will have good consultation and interpersonal skills
>  and some knowledge of advertising. Candidates must be able to keep on
>  focused and motivated when working alone.
>  Spam: Yes
>  SpamScore: 100.00%
>  ---------------------------

> - which does not show a blank line after the X-AntiSpam headers, and does
> show SpamStamp's headers added after lines that clearly aren't headers.

I said "in the AntiSpam ISP log". The above quote is the message as 
received in Messenger, which is what caused me to raise the problem. I 
think I explained that the full headers were shown on the AS ISP log.

>> There wasn't a duplicate complete message at this time.
>> 
>> It looks as though the incoming message contained some very long lines
>> which were split every 250 characters. If I rejoin these lines then I
>> do get the ten body lines as requested. The gap you refer to is a
>> blank line.

> I wonder if, although the log shows a blank line, it wasn't present in the
> download file passed to SpamStamp.  A blank line in an email is expected to
> have a certain surrounding set of delimiters and just possibly it didn't
> arrive at SpamStamp in the correct manner.

As far as I know there isn't any backup of the input to SpamStamp, so 
I have no way of checking this (unless you know differently).

> That's especially possible if the AS limitation (caused by BASIC not
> handling long strings) somehow mangled an embedded blank line.

> Remember the log shows data interpreted by AS while it is testing a mail,
> before deciding to delete or download a mail.  Data downloaded isn't logged
> (well, it was in one of my versions, byte by byte, when I was looking for
> problems in download logic, but I dunno if Frank's code has that option.
> And even if it does such logging generated so much output that unless you
> have a repeatable problem one tends to leave the option off.)

I have the logging level set to 'both' but the debug log off.

>>>> ----------------------------------------------------------------------
>>>> 
>>>> The backup file in MsgServe is identical except for the #! rmail
>>>> 0000530 line. The first part of the message body is also missing.
>> 
>>> Missing from where?  How do you know what is missing?  Do you actually
>>> have a copy (from somewhere else?) of what the whole message should look
>>> like? If so wouldn't it be better to share it?
>> 
>> 1. Missing from the message that was delivered.

> That's still ambiguous.  Do you mean the message logged by AS when it tested
> the message, or the message collected by AS from your ISP before it went to
> SpamStamp, or what came out of SpamStamp, or what you saw in your email
> client (with or without full headers displayed)?

Missing from the message delivered to MessengerPro. Where it went 
missing is the unknown. The headers and body text were ok when tested 
by AntiSpam. They may or may not have been correct when the whole 
message was returned.

>> 2. Because it's present in the AntiSpam ISP log.

> If the whole message is in the AS log you might be able to extract a copy
> from the log and put it in a temporary file, and then pass that to SpamStamp
> and see what it makes of it.  That is, see if you can prove an error in
> SpamStamp, or perhaps in your email client's display of what SpamStamp
> produces.

That's a possibility. I don't know if the end of the message was 
complete but I can certainly test it.

>> 3. I have a copy of the message that was debatched in the MsgServe
>> backup directory.

> Could you zip up the whole of the relevant logs, and the whole of the backup
> file etc and send them to me?  I find it hard to picture EXACTLY what is
> where from your descriptions.

I think you have all of the relevant bits of the log and the whole of 
the backup file. The only bits I chopped out were as marked.

>> I do not have a copy of the incoming raw message
>> that should have been returned from the server.

> Pity; I used to keep 20 generations of backups of incoming files
> automatically so that, provided I recognised there'd been a problem in the
> last 20 sets of files processed, I could examine them.  But that was a
> hangover from when I wrote my version of AS, and nowadays I don't keep
> anything like as much.

SS is set to keep 10 backup copies but I can't find where it stores 
them. The backup directory in !MsgServe.SpamStamp contains four files 
from 2006.

>> It is of course
>> entirely possible that the server returned the complete headers plus
>> ten body lines the first time and then only sent a few lines of text
>> when it was asked to return the whole message.

> I think that's pretty unlikely.  It's more likely that some aspect of very
> long lines plus, perhaps, a badly constructed email, got mishandled.


>> 4. It might have been but they are quite long. However since you ask
>> I'll share the relevant ones with you, starting with the end of the
>> first message:
>> 
>> Message 1 accepted on rule 25:
>> ACCEPT To:      = *messenger-l*
>> To: messenger-l@xxxxxxxxx
>> From: "Andrew Weston" <aw29009@xxxxxxxxx>
>> Subject: Re: Hermes mailing list
>>> RETR 1
>> < +OK 5058 octets follow.
>>> DELE 1
>> < +OK Deleted.
>>> LIST 2
>> < +OK 2 5685
>>> TOP 2 10
>> < +OK headers follow.
>> < X-Daemon-Classification: INNOCENT
>> < Envelope-to: ricp@xxxxxxxxxxxxxxxx
>> < Delivery-date: Mon, 10 Sep 2007 15:32:59 +0000
>> 
>> [snip tracking headers]
>> 
>> < Message-ID: <000901c7f3bf$05800796$78a6a7a5@vbmusj>
>> < From: "jamal tien-fu" <cassius@xxxxxxxxxxx>
>> < To: <ricp@xxxxxxxxxxxxx>
>> < Subject: Serious business in a sphere of financial services. (no
>> investment reqired)
>> < Date: Mon, 10 Sep 2007 13:43:53 +0000
>> < MIME-Version: 1.0
>> < Content-Type: text/plain;
>> <  format=flowed;
>> <  charset="iso-8859-1";
>> <  reply-type=original
>> < Content-Transfer-Encoding: 7bit
>> < X-Priority: 3
>> < X-MSMail-Priority: Normal
>> < X-Mailer: Microsoft Outlook Express 6.00.3790.2663
>> < X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2757
>> < X-orpheus-MailScanner: Found to be clean
>> < X-orpheus-MailScanner-SpamCheck: not spam, SpamAssassin (not cached,
>> <  score=2.8, required 5, RAZOR2_CF_RANGE_51_100 0.50,
>> <  RAZOR2_CF_RANGE_E4_51_100 1.50, RAZOR2_CHECK 0.50,
>> <  SARE_WEOFFER 0.30)
>> < X-orpheus-MailScanner-SpamScore: 2
>> < X-MailScanner-From: cassius@xxxxxxxxxxx
>> < X-PN-Spam-Filtered: by PlusNet MXCore (v3.00)
>> < X-DSPAM-Result: Innocent
>> < X-DSPAM-Processed: Mon Sep 10 16:33:00 2007
>> < X-DSPAM-Confidence: 0.5401
>> < X-DSPAM-Improbability: 1 in 118 chance of being spam
>> < X-DSPAM-Probability: 0.0000
>> < X-DSPAM-Factors: 27,
>> 
>> [snip spam factors]
>> 
>> <
>> < Electronics: Building Chips in 3-D Dr. Krishna Saraswat, Electronic
>> Engineering; Dr. Chris Chidsey, Chemistry

> You've snipped, or maybe not, the exact detail of what happened at the end
> of the headers, which is what matters. Why did you do that?

I've only snipped the spam factors - 27 of them. There were no further 
headers.

> What's on the line immediately above the "> < Electronics:" one, and what's
> on the line immediately above that?  It matters byte by byte, eg whether
> there are LF and CR bytes there, not just what you see visibly in an editor.

The last factor and start of body is (I've retyped the tab):

<  [09]molecules+and, 0.99000
<
< Electronics: Building Chips in 3-D Dr. Krishna Saraswat, Electronic 
Engineering; Dr. Chris Chidsey, Chemistry

There were no CRs which would be seen as [0d] in !Edit.

> I can't guess whether SpamStamp would think that the line before
> "Electronics:" is the blank line at the end of the headers. Or perhaps that
> "Electronics:" is itself a header line - as it starts with one
> colon-terminated word.

As I said, the line before "Electronics" is a blank line.

> Possibly AS or SpamStamp have thought that the Electronics: line was a
> header AND have thought that the immediately following text was part of that
> header.

I can't see why it should have done that.

>> <
>> < Hello,
>> < First and Primarily, we would kindly like to express our deep
>> greetings to you and your relatives and wish you all good condition
>> and happiness and more success in business. Our International
>> Corporation is looking for new employees on different vaca
>> < ncies. We are by now for a long time in the market and now we
>> recruit human resources to occupation from home.
>> <
>> < Our Corporation Main center is located in United Kingdom with
>> branches all over the world. Our supreme desire now is to expand our
>> business level to more countries, so we are advertising here in hope
>> of cooperating with you all. We highly appreciate
>> < honest and ingenious employers. You do not need to spend any sum of
>> money and we do not ask you to provide us with your bank account
>> number! We are occupied in totally officially authorized activity and
>> working in our company you can reach career gro
>> < wth at a permanent job.
>> <

> Seeing as the lines just above are clearly split mid-word, this looks like
> logging has suffered from the long-lines problem.

That figures. So AS did get the requisite ten lines then.

> But download of a file shouldn't have the problem because AS should be
> taking incoming data byte by byte and transferring it to the download file.




>> 
>>>> but more than the ten requested body lines.
>> 
>>> Well, what's in the more-than-ten lines?
>> 
>> The text of the spam - see above. What it actually says is irrelevant.
>> 
>>>> I don't know where else to look, but it seems that AntiSpam has chopped
>>>> the message because its headers are present.
>> 
>>> You mean, perhaps, that: it seems because its headers are present that
>>> AntiSpam has chopped ...  (slightly different implication...).
>> 
>> I mean because AntiSpam's own headers are present then it looks like
>> the damage had been done before the message left AntiSpam - possibly
>> before it was received by AntiSpam.

> More ambiguity - I'm sure you know what youthink you mean but I don't know
> with certainty what that is...

AntiSpam prepends its own headers to the received message. At that 
point it doesn't want to know what was in the message, and mustn't 
insert a blank line after

X-AntiSpam-Action: default

because it is expecting more headers to follow.

It seems inconceivable that SpamStamp should chop all the other 
headers and some of the body, but not AntiSpam's headers. This is why 
I've concluded that something happened at the server or in AntiSpam or 
in the file system.

>> Would AntiSpam report an error if it was told that 5685 octets
>> followed and only 530 were returned?

> Early versions wouldn't have.  I doubt newer ones do.  Here the number of
> octets I actually receive are greater than the number reported by the ISP's
> mail server because (using VRPC) Windows's AV software gets into the midst
> of all this and the message that comes in from the ISP gets stuff added to
> it by the AV software before AS sees it.

-- 
 _
|_|. _   Richard Porter               http://www.minijem.plus.com/
|\_||_                                mailto:ricp@xxxxxxxxxxxxxxxx

Other related posts: