On Tue, Oct 14, 2014 at 11:13:02AM +0200, Axel Dörfler wrote: > Am 14.10.2014 11:07, schrieb pulkomandy@xxxxxxxxxxxxx: > >+ "0.251 [0:15] (\"<?xml\" | \"<\\000x\\000m\\000l\")" > > That doesn't look right: '<' has no prefix (what is \\000 anyway?), and > there is no '?'? I forgot the ?, added it in hrev48019. The rule is a bit tricky but it works: \\000 is unescaped once by the rdef parser to \000. It is then escaped a second time by the MIME sniffing rule parser to a null character. The first character has no suffix, and the last one has no suffix. This allows the rule to match UTF16 no matter which endianness is used (both "<\0?\0x\0m\0l\0" (big endian) and "\0<\0?\0x\0m\0l" (little endian) are recognized). It also uses the [0:15] to skip the byte order mark, if any. Note the xhtml rule already uses a similar format, and works fine. I don't think there currently is a more readable way to express this. -- Adrien.