[haiku-development] UTF-8 BOM and StyledEdit...

  • From: "François Revol" <revol@xxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Fri, 21 Dec 2007 01:26:20 +0100 CET

I just opened an utf-8 text file made from windows XP in Zeta's
StyledEdit...
yes notepad can do that... well as always it puts its own crap in.
It displayed an unknown-boxed-char for the first one, and as I looked
up, notepad starts Unicode files with a BOM character.
Cf. http://en.wikipedia.org/wiki/Byte_Order_Mark

I checked our own StyledEdit and it also tries to show this "zero-width
no-break space" char...

For UTF-8 it doesn't really bring any byte order info but it still
hints the encoding, so maybe we should use it as well for
interoperability... maybe later handle it in mime sniffing to add
be:encoding attribute.
At least we should handle files having those to not show this char.
I'm not sure how to do that, either BTextEdit/View should account for
it, or maybe at the font layer itself ?
Or StyledEdit could strip it when reading a file, but it's not the
cleanest way IMO. Maybe handle it in STXT or text Translators ?

François.

Other related posts: