[haiku-development] Re: BString and UTF-8

  • From: Oliver Tappe <zooey@xxxxxxxxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Sun, 04 Dec 2011 14:16:40 +0100

On 2011-12-04 at 02:25:23 [+0100], David Given <dg@xxxxxxxxxxx> wrote:
> On 03/12/11 23:22, Oliver Tappe wrote:
> [...]
> > Hm, let me think, where did I get this from? Ah, right, String.h. If you
> > would have taken care to look, you would have noticed that it declares
> > several ...Chars() methods that take character based lengths and indices 
> > as
> > parameters. Those support (and expect) UTF-8 encoding.
> 
> Just to clarify: by 'characters', you mean code points, right? They're
> not quite the same thing --- the closest Unicode equivalent to a
> character is the grapheme cluster, which can actually be made out of
> multiple code points.

Yes, that's a very good point: the thing is that BString's ...Char() methods 
indeed deal with code points, i.e. the things encoded within a single UTF-8 
entity. So the naming of those BString methods is kind of misleading, if one 
would for instance expect to be able to draw each of the "characters" 
returned by CharAt() - even the CharAt() method that's capable of passing 
out a multibyte-"character" only passes out the next code point.

cheers,
        Oliver

Other related posts: