On 12/2/2011 11:08 AM, Axel Dörfler wrote:
This all sounds like very welcomed changes, but this one cannot be done in a backward compatible manner, unfortunately. As long as all *Chars() methods deal with invalid UTF-8 correctly (as they should), there is no security risk either. I would just add a method IsValidUTF8() or something to that degree which you can use to test it. Bye, Axel.
I'm certain that it CAN be done in a backwardly compatible manner.And *Chars methods, by definition, can't deal correctly with invalid UTF-8 strings. What should be done with a "replace char" method when, for example, the code point has one lead byte and 9 trailing bytes? There is no way that anything meaningful could be done.
If you look at how ICU handles things, it doesn't allow invalid strings. If you try to create a UnicodeString with invalid input, it replaces the invalid code points with the replacement character, 0xfffd. Other systems that use Unicode do similar things.
I know that some security exploits have passed executable code as a string as a means to breach security of an OS. At the "Internationalization and Unicode Conference" in 2002, there was a paper presented that talked about security considerations with UTF-8. (http://unicode.org/iuc/iuc22/a323.html)
If you will give my changes a chance, I think you will see that everything I'm doing will have a positive effect on the BString class and Haiku.
Also, I'm not a committer. Someone will have to verify what I'm doing before it will be committed. I'm not going to submit something that will break things, because I know it will never make it into the source tree.
And yes, I already have a static IsValidUTF8() method on the BString. Michael