[haiku-development] Re: BString and UTF-8

  • From: Oliver Tappe <zooey@xxxxxxxxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Sun, 04 Dec 2011 00:22:45 +0100

On 2011-12-03 at 20:22:11 [+0100], pete.goodeve@xxxxxxxxxxxx wrote:
> On Sat, Dec 03, 2011 at 01:50:20PM +0100, Oliver Tappe wrote:
[ ... ]
> > 
> > The BString functions that deal with character indices rely on the string 
> > to
> > contain UTF-8 characters, too.
>  
> Hunh?  Where do you get that from?  The indexing methods (ByteAt() and
> the [] operator) both explicitly return char.  I can see nothing in the
> current BString API that has any concept of multibyte codes.

Hm, let me think, where did I get this from? Ah, right, String.h. If you 
would have taken care to look, you would have noticed that it declares 
several ...Chars() methods that take character based lengths and indices as 
parameters. Those support (and expect) UTF-8 encoding.
 
> I think it should be possible to add "CodePoint(int32 index)" et al methods,
> but it would be extremely unwise to try to revamp the current assumptions.
> When a BString is intended for display (as in your TruncateString()), sure
> it should be UTF-8, but it's intended as a general utility object, and
> you can't anticipate what someone might want to use it for.

Whatever the original intentions were, as it is currently, BString is an 
unfortunate mixture of byte buffer and character string. Anyway, the point is 
a bit moot, since we can't do much about it until after R1.

cheers,
        Oliver

Other related posts: