> You do realise that the link below is not actually a Bayesian technique :-) Yup, while implementing the math I noticed Paul Graham was dividing apples by oranges. I went for a less adhoc approach. But there's still a lot of adhoc stuff, like why use the top 15 words? Why 15? > Have a read of > http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html Thanks. I came across that yesterday and bookmarked it, didn't have time to actually read it in detail. Incidentally, it was during my search for databases of spam messages. Lots of databases with e-mail addresses and DNS names and IP addresses, but no actual full message databases. Anyone know of one? Since this isn't really OpenBeOS stuff, please send any replies to the bedevtalk@xxxxxxxxxxxxx mailing list. - Alex