[haiku-development] Re: BString and UTF-8

  • From: Michael Bridgers <mibrid@xxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Fri, 02 Dec 2011 20:44:09 -0500



On 12/2/2011 11:08 AM, Axel Dörfler wrote:

This all sounds like very welcomed changes, but this one cannot be done
in a backward compatible manner, unfortunately.
As long as all *Chars() methods deal with invalid UTF-8 correctly (as
they should), there is no security risk either. I would just add a
method IsValidUTF8() or something to that degree which you can use to
test it.

Bye,
Axel.



I'm certain that it CAN be done in a backwardly compatible manner.

And *Chars methods, by definition, can't deal correctly with invalid UTF-8 strings. What should be done with a "replace char" method when, for example, the code point has one lead byte and 9 trailing bytes? There is no way that anything meaningful could be done.

If you look at how ICU handles things, it doesn't allow invalid strings. If you try to create a UnicodeString with invalid input, it replaces the invalid code points with the replacement character, 0xfffd. Other systems that use Unicode do similar things.

I know that some security exploits have passed executable code as a string as a means to breach security of an OS. At the "Internationalization and Unicode Conference" in 2002, there was a paper presented that talked about security considerations with UTF-8. (http://unicode.org/iuc/iuc22/a323.html)

If you will give my changes a chance, I think you will see that everything I'm doing will have a positive effect on the BString class and Haiku.

Also, I'm not a committer. Someone will have to verify what I'm doing before it will be committed. I'm not going to submit something that will break things, because I know it will never make it into the source tree.

And yes, I already have a static IsValidUTF8() method on the BString.

Michael

Other related posts: