[haiku-appserver] Re: moreUTF8.h
- From: "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx>
- To: haiku-appserver@xxxxxxxxxxxxx
- Date: Wed, 15 Jun 2005 20:58:17 +0200 CEST
"Stephan Assmus" <superstippi@xxxxxx> wrote:
> static inline bool
> IsInsideGlyph(uchar ch)
> {
> return (ch & 0xC0) == 0x80;
> }
>
> This code returns true for the following pattern, right?
>
> 10?? ????
Exactly. Note, that this is only correct for the subsequent characters,
not the first one.
> This code...
>
> const char *ptr = text;
>
> do {
> ptr++;
> } while (IsInsideGlyph(*ptr));
>
> return ptr - text;
>
> ...increments the ptr once, then tests for IsInsideGlyph. Which will
> return true in case only the first high bit is set. So how does this
> work for three byte glyphs?
>
> A three byte glyph looks like this (correct me if I'm wrong):
>
> 1110 ????
> 110? ????
> 10?? ????
That's not correct, for bytes inside the glyph, 10 is set always, only
the other 6 bits are used for character data. The first 3 bits of the
first byte determines the length of the character.
So the code looks okay, AFAICT.
> So when IsInsideGlyph tests the second byte, it would return false,
> no?
> Which means moreUTF8.h only works for 2 byte glyphs. Can someone
> confirm? If my observation is correct, I'm going to fix the problem
> with count_utf8_bytes() that I introduced in my last commit. If there
> is a better way, speak up! :-)
Unless I am wrong, there is no need to do this :-)
Bye,
Axel.
- Follow-Ups:
- [haiku-appserver] Re: moreUTF8.h
- From: Stephan Assmus
- References:
- [haiku-appserver] moreUTF8.h
- From: Stephan Assmus
Other related posts:
- » [haiku-appserver] moreUTF8.h
- » [haiku-appserver] Re: moreUTF8.h
- » [haiku-appserver] Re: moreUTF8.h
- [haiku-appserver] Re: moreUTF8.h
- From: Stephan Assmus
- [haiku-appserver] moreUTF8.h
- From: Stephan Assmus