Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx> wrote: > since BeOS seems to have built-in support for recognizing files as > text > files, we want to have the same. I'm about to implement that, missing > is > basically the algorithm deciding whether (or with what probability) a > buffer of bytes actually contains text. > > A simple but maybe a bit ignorant approach would be to check whether > the > buffer contains valid UTF-8 characters only (or more than, say, 95%). > But > maybe someone has better ideas... I would add special rule semantics for this, ie. a "text" rule and an "ascii" rule where the former would accept UTF-8 and the latter plain ASCII only, maybe even with a method to specify the minimal congruence. If you have a look at BSD's "file", the text magic happens in ascmagic.c - it looks very reasonable to me, and could even identify the charset for StyledEdit (at least in a basic way that should be enough for the Western world). Bye, Axel.