[sanesecurity] Re: ham testing

From: Michael Orlitzky <michael@xxxxxxxxxxxx>
To: sanesecurity@xxxxxxxxxxxxx
Date: Tue, 14 Jul 2009 13:27:59 -0400

Tom Shaw wrote:

Steve and Bill,
Personally I think "ham" testing will not add as much "safety" as beingasserted.
1st your ham and my ham are vastly different as are others on the list.Further, ham for Europe is different than ham for an Asian than ham fora South America user, etc.
OK, ham testing theoretically could have detected "acebook.com" but Ihave friends and clients who do not have facebook.com in their hambecause they wash their ham every 14 days and so would have neverdetected the problem prior to a facebook message appearing. FUrther, Iexpect that the next FP to happen will not be in whatever "ham" set youare testing against which might make ham testing intrinsically problematic.

I'm going to have to disagree here, too. Estimates of "safety" aside,testing the signatures against a known-good corpus undoubtedly provides/some/ safety, and I would argue that it does so at little to no extra cost.

In general, I'm much more concerned about false positives than I amfalse negatives. Testing against known ham provides a safety net againstfalse positives, at only a comparable risk of false negatives (in thecase where we ignore some signatures temporarily). Given that I weighthe false positives much more heavily, I see that as an overall benefit.

True, users will differ with regards to what they consider "good" mail,but I don't think that can be used as an argument against testing here:if your users don't have any mail that would trigger "acebook.com," thenthe "acebook.com" signature is less likely to be a false positive foryour users. Not impossible, but less likely. And in that case, you'reright back where you started; no harm done. But for others (for whom"acebook.com" is a false positive), damage would have been prevented.

IMHO you are better off checking for small sigs (which would havedetected the "com" problem) and washing against large whitelists (whichwe do already do).
I can provide you and Bill my script to check signatures against URIblwhitelists as well as bondedsender and many others and cache results ifyou want. It currently is in PHP but could easily be used as is or recoded.
These queryable DB's are much more comprehensive than someone's (or agroup of someones') ham. Further, the shear effort to maintain acomprehensive, world aware ham database seems like a tall order.

If admins are alerted when the "ham test" fails, they will be able toreport false positives quicker, improving the overall quality of thedatabase as a result.

I don't think anyone is suggesting we maintain a world-wide database ofknown-good mail. On the contrary, as you mentioned, there would be lessbenefit to me testing against someone else's mail. In this respect, Ithink that the ability to test one's own mail (e.g. via the updatescript) could be more effective than pre-screening the signatures beforethey are entered in to the database.

Of course, having to do the work oneself significantly reduces thenumber of people who are willing to put forth the effort. Fortunately,the two are not mutually exclusive.

I would suggest trying filtering on small sigs and checking these worldaware whitelists as a first start before taking on the task of hammaintenance. Our experience in adding these checks to the winnow dynamicprocess makes a big difference.
If after that more checking seems to be in order, someone can start tobuild a comprehensive ham DB.
I would also like to query how many folks are using these dynamic sigswithout scoring. The reason I ask is it has been reiterated over andover again to use them as part of scoring. We score and did notexperience rejections of com nor acebook.com. Maybe the solution is toask for scoring or have users reconsider there scores - after all thisis what you have to do in any scoring based system.

These are all good ideas. However, they aren't mutually exclusiveeither. It is entirely possible that maximum accuracy will be attainedby some combination of the methods being discussed. For example, ifsomeone is using the signatures without scoring; sure, he can improvehis accuracy by implementing scoring. But, could he improve it furtherby implementing scoring *and* the "ham test?" Probably.

References:
- [sanesecurity] ham testing
  - From: Steve Basford
- [sanesecurity] Re: ham testing
  - From: Tom Shaw

[sanesecurity] Re: ham testing

Other related posts: