[haiku-appserver] moreUTF8.h

  • From: "Stephan Assmus" <superstippi@xxxxxx>
  • To: haiku-appserver@xxxxxxxxxxxxx
  • Date: Wed, 15 Jun 2005 19:39:55 +0200 CEST


static inline bool
IsInsideGlyph(uchar ch)
        return (ch & 0xC0) == 0x80;

This code returns true for the following pattern, right?

10?? ????

This code...

        const char *ptr = text;

        do {
        } while (IsInsideGlyph(*ptr));
        return ptr - text;

...increments the ptr once, then tests for IsInsideGlyph. Which will 
return true in case only the first high bit is set. So how does this 
work for three byte glyphs?

A three byte glyph looks like this (correct me if I'm wrong):

1110 ????
110? ????
10?? ????

So when IsInsideGlyph tests the second byte, it would return false, no? 
Which means moreUTF8.h only works for 2 byte glyphs. Can someone 
confirm? If my observation is correct, I'm going to fix the problem 
with count_utf8_bytes() that I introduced in my last commit. If there 
is a better way, speak up! :-)

Best regards,

Other related posts: