[openbeos] Re: Identifying Text Files
- From: Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx>
- To: openbeos@xxxxxxxxxxxxx
- Date: Fri, 09 Jun 2006 21:14:47 +0200
On 2006-06-09 at 13:13:35 [+0200], Axel Dörfler <axeld@xxxxxxxxxxxxxxxx>
wrote:
> Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx> wrote:
> > since BeOS seems to have built-in support for recognizing files as
> > text
> > files, we want to have the same. I'm about to implement that, missing
> > is
> > basically the algorithm deciding whether (or with what probability) a
> > buffer of bytes actually contains text.
> >
> > A simple but maybe a bit ignorant approach would be to check whether
> > the
> > buffer contains valid UTF-8 characters only (or more than, say, 95%).
> > But
> > maybe someone has better ideas...
>
> I would add special rule semantics for this, ie. a "text" rule and an
> "ascii" rule where the former would accept UTF-8 and the latter plain
> ASCII only, maybe even with a method to specify the minimal congruence.
I don't quite understand what you mean. I would simply take ascmagic.c,
adjust it (to C++, parameters/return types of the identification function,
strip things I don't need) and return the type it finds.
> If you have a look at BSD's "file", the text magic happens in
> ascmagic.c - it looks very reasonable to me, and could even identify
> the charset for StyledEdit (at least in a basic way that should be
> enough for the Western world).
I intend to incorporate that code into the text sniffer add-on directly,
stripping as much of the character set stuff as possible. But we can
certainly provide a library function (e.g. in libtextencoding) that guesses
the character encoding/set of a given buffer.
CU, Ingo
- Follow-Ups:
- [openbeos] Re: Identifying Text Files
- From: François Revol
- [openbeos] Re: Identifying Text Files
- From: Axel Dörfler
- References:
- [openbeos] Re: Identifying Text Files
- From: Axel Dörfler
Other related posts:
- » [openbeos] Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- » [openbeos] Re: Identifying Text Files
- [openbeos] Re: Identifying Text Files
- From: François Revol
- [openbeos] Re: Identifying Text Files
- From: Axel Dörfler
- [openbeos] Re: Identifying Text Files
- From: Axel Dörfler