[openbeos] Re: Identifying Text Files

  • From: "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Fri, 09 Jun 2006 13:13:35 +0200 CEST

Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx> wrote:
> since BeOS seems to have built-in support for recognizing files as 
> text 
> files, we want to have the same. I'm about to implement that, missing 
> is 
> basically the algorithm deciding whether (or with what probability) a 
> buffer of bytes actually contains text.
> 
> A simple but maybe a bit ignorant approach would be to check whether 
> the 
> buffer contains valid UTF-8 characters only (or more than, say, 95%). 
> But 
> maybe someone has better ideas...

I would add special rule semantics for this, ie. a "text" rule and an 
"ascii" rule where the former would accept UTF-8 and the latter plain 
ASCII only, maybe even with a method to specify the minimal congruence.
If you have a look at BSD's "file", the text magic happens in 
ascmagic.c - it looks very reasonable to me, and could even identify 
the charset for StyledEdit (at least in a basic way that should be 
enough for the Western world).

Bye,
   Axel.


Other related posts: