[openbeos] Re: Identifying Text Files

  • From: "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Fri, 09 Jun 2006 23:57:24 +0200 CEST

Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx> wrote:
> On 2006-06-09 at 13:13:35 [+0200], Axel Dörfler <axeld@xxxxxxxxxxxxxxxx> 
> > wrote:
> > I would add special rule semantics for this, ie. a "text" rule and 
> > an
> > "ascii" rule where the former would accept UTF-8 and the latter 
> > plain
> > ASCII only, maybe even with a method to specify the minimal 
> > congruence.
> I don't quite understand what you mean. I would simply take 
> ascmagic.c, 
> adjust it (to C++, parameters/return types of the identification 
> function, 
> strip things I don't need) and return the type it finds.

I would have enlarged the sniffer rule language for something like 
this:
0.5 [0:511] text

With add-ons, it's probably not a good idea to do it like this. I am 
also not fond of the idea of having any real add-ons in the registrar, 
especially third party ones (crashing that one is a very bad idea).

> > If you have a look at BSD's "file", the text magic happens in
> > ascmagic.c - it looks very reasonable to me, and could even 
> > identify
> > the charset for StyledEdit (at least in a basic way that should be
> > enough for the Western world).
> I intend to incorporate that code into the text sniffer add-on 
> directly, 
> stripping as much of the character set stuff as possible. But we can 
> certainly provide a library function (e.g. in libtextencoding) that 
> guesses 
> the character encoding/set of a given buffer.

Sounds like a good idea, too.

Bye,
   Axel.


Other related posts: