On 2006-06-09 at 23:10:01 [+0200], François Revol <revol@xxxxxxx> wrote: > > > If you have a look at BSD's "file", the text magic happens in > > > ascmagic.c - it looks very reasonable to me, and could even > > > identify > > > the charset for StyledEdit (at least in a basic way that should be > > > enough for the Western world). > > > > I intend to incorporate that code into the text sniffer add-on > > directly, > > stripping as much of the character set stuff as possible. But we can > > certainly provide a library function (e.g. in libtextencoding) that > > guesses > > the character encoding/set of a given buffer. > > > > Emacs does that ... but in lisp :) > > More seriously, instead of creating yet another kind of addons... > I've been thinking for some time about reusing the translators as a > mean to index meta data from file content, by adding a SniffAndIndex() > method or so... > that would fill in atribs from ID3 for example... (not exactly the best > example but well... or get image w & h form the file). I guess we could > use that for mime sniffing as well. > I think translators are probably the best suited to know what is inside > files the support. Also when converting files with them they would just > have to call their SniffAndIndex method to put the meta data right > away. That'd save ppl form adding it manually, something we should have > done for long, other OS already do. > There might be a perf issue though if we load every translator each > time... I think so. I believe the main problem of this approach is that it is a bit of overkill to use, in some cases, quite heavy-weight translators for a simple task like identifying the file. Which doesn't even need to be *that* exact. So the MIME sniffer rules and light-weight add-ons for exceptional cases aren't too bad an idea IMHO. CU, Ingo