[openbeos] Re: Identifying Text Files

  • From: Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Sat, 10 Jun 2006 17:58:33 +0200

On 2006-06-09 at 23:10:01 [+0200], François Revol <revol@xxxxxxx> wrote:
> > > If you have a look at BSD's "file", the text magic happens in
> > > ascmagic.c - it looks very reasonable to me, and could even
> > > identify
> > > the charset for StyledEdit (at least in a basic way that should be
> > > enough for the Western world).
> > 
> > I intend to incorporate that code into the text sniffer add-on
> > directly,
> > stripping as much of the character set stuff as possible. But we can
> > certainly provide a library function (e.g. in libtextencoding) that
> > guesses
> > the character encoding/set of a given buffer.
> > 
> 
> Emacs does that ... but in lisp :)
> 
> More seriously, instead of creating yet another kind of addons...
> I've been thinking for some time about reusing the translators as a
> mean to index meta data from file content, by adding a SniffAndIndex()
> method or so...
> that would fill in atribs from ID3 for example... (not exactly the best
> example but well... or get image w & h form the file). I guess we could
> use that for mime sniffing as well.
> I think translators are probably the best suited to know what is inside
> files the support. Also when converting files with them they would just
> have to call their SniffAndIndex method to put the meta data right
> away. That'd save ppl form adding it manually, something we should have
> done for long, other OS already do.
> There might be a perf issue though if we load every translator each
> time...

I think so. I believe the main problem of this approach is that it is a bit 
of overkill to use, in some cases, quite heavy-weight translators for a 
simple task like identifying the file. Which doesn't even need to be *that* 
exact. So the MIME sniffer rules and light-weight add-ons for exceptional 
cases aren't too bad an idea IMHO.

CU, Ingo

Other related posts: