In article <4e819f7af7freelists@xxxxxxxxxxxxxxxx>, Martin <freelists@xxxxxxxxxxxxxxxx> wrote: > Now that AS v1.59a8 has RegEx support, I wondered if anyone was > interested in a little application I wrote to help test Regular > Expressions in a standalone way before committing them to a Rules > file? (... which I needed when helping to add RegEx support to AS!) > It is currently in Beta status, but seems reasonably stable now, and > I would be interested in any comments before I formally release it. > In anyone would like a copy, please email me direct. <brain dump on> I would like, but in the next few days have no opportunity to play with it. For a long time I had Pluto's regex support in use, but after a while I turned a lot of my regex rules off. Why? Because I wasn't completely convinced that they made filter checks faster. The suspected reason will also affect people using regex in AntiSpam (unless there's an option in the way that AS calls regex that defeats this). The problem is that when a person sees a pattern like A.*B.* (meaning an A followed by many characters followed by a B followed by other stuff) and considers whether that would match a string like A fkhfjg fgfj ghfkgfjgjfg kjgfgfgjfgjfjgfjgfg B z they can see at a glance that the answer is yes. But if the test string was A B A B A lkgdgf gfjglfgklfgfkgkfgfkgfgkfkgfgfgklfgjfg A B lfjglk B a regex will first decide that the string matches the initial A B then have another look and decide it matches: A B A B, then have another look until it sees the almost entire string matches. That's because it looks for the longest matching subexpression. If the whole regular expression is complicated it's just possible that the regex module will take a long time to find the longest possible match rather than simply say "yes, there's A match" and not try to find a better one. For both Pluto's use and AS's use, users don't care what the matching expression is, just that something matched... OTOH the test strings being compared with a regular expression should be short (both in Pluto and here) so you'd wonder why that would really matter - surely it wouldn't take long to find a long subexpression rather than a short one. Well, I dunno. A problem (I thought) in the implementation of regex in Pluto was that there was no way to trace (and in particular timestamp these traces) the use of regex by Pluto. I found that once in a while a debatch of data would be incredibly slow. I thought it might (in Pluto's case, probably not applicable here) happen when a "body" regex test examined character by character the encoded representation of an attached jpeg or other binary file.... then a test looking for, say a munged drug name might have many hundreds of thousands of characters of picture data to look at. But I couldn't prove where the time was being wasted. Pluto (I think) compiled regex tests once and then applied them. Another possibility is that once in a while a compiled expression got corrupted so it meant something else and the regex pattern-match code then set off on a wild goose chase. Or maybe it set off with an ok pattern but missed the end of the search string and tried to match all of memory. Whatever, I'd suggest caution! Also (to Frank) are you setting thing up so that regex tests look at the pure incoming headers, or those that have already been folded to all upper or all lower case? If the stuff being tested is all one or other case, is there anything to stop people using the extended regex stuff that says, say, match an "a" or and "A" - because obviously that's a waste of effort. Certainly a pattern explicitly containing eg "[aA]" would be a waste. I think that with Pluto JSD set things up so that whatever a user coded as a pattern was invisibly extended that way, and that caused a problem too: a user could define a - say - 55 character search pattern which became much longer than that when Pluto expanded it - and that overflowed the buffer JSD has allowed for the user-supplied pattern to live in. I'm sure you already know that mixed case patterns need to be supported. -- Jeremy C B Nicoll, Edinburgh, Scotland - my opinions are my own.