Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx> wrote: > On 2006-06-09 at 13:13:35 [+0200], Axel Dörfler <axeld@xxxxxxxxxxxxxxxx> > > wrote: > > I would add special rule semantics for this, ie. a "text" rule and > > an > > "ascii" rule where the former would accept UTF-8 and the latter > > plain > > ASCII only, maybe even with a method to specify the minimal > > congruence. > I don't quite understand what you mean. I would simply take > ascmagic.c, > adjust it (to C++, parameters/return types of the identification > function, > strip things I don't need) and return the type it finds. I would have enlarged the sniffer rule language for something like this: 0.5 [0:511] text With add-ons, it's probably not a good idea to do it like this. I am also not fond of the idea of having any real add-ons in the registrar, especially third party ones (crashing that one is a very bad idea). > > If you have a look at BSD's "file", the text magic happens in > > ascmagic.c - it looks very reasonable to me, and could even > > identify > > the charset for StyledEdit (at least in a basic way that should be > > enough for the Western world). > I intend to incorporate that code into the text sniffer add-on > directly, > stripping as much of the character set stuff as possible. But we can > certainly provide a library function (e.g. in libtextencoding) that > guesses > the character encoding/set of a given buffer. Sounds like a good idea, too. Bye, Axel.