On 2011-12-04 at 02:25:23 [+0100], David Given <dg@xxxxxxxxxxx> wrote: > On 03/12/11 23:22, Oliver Tappe wrote: > [...] > > Hm, let me think, where did I get this from? Ah, right, String.h. If you > > would have taken care to look, you would have noticed that it declares > > several ...Chars() methods that take character based lengths and indices > > as > > parameters. Those support (and expect) UTF-8 encoding. > > Just to clarify: by 'characters', you mean code points, right? They're > not quite the same thing --- the closest Unicode equivalent to a > character is the grapheme cluster, which can actually be made out of > multiple code points. Yes, that's a very good point: the thing is that BString's ...Char() methods indeed deal with code points, i.e. the things encoded within a single UTF-8 entity. So the naming of those BString methods is kind of misleading, if one would for instance expect to be able to draw each of the "characters" returned by CharAt() - even the CharAt() method that's capable of passing out a multibyte-"character" only passes out the next code point. cheers, Oliver