[openbeos] Re: So much for premature optimization

Erik Jaesler wrote:

> In my defense, I can say only this: it occured to me that utf8_char_len 
> might be less effecient than the simple loop already there because 
> although you don't have to look at every single byte, the test in 
> utf8_char_len is quite a bit more expensive.  Owing to that suspicion, I 
> left the original implementation in place, #if 0'd out, until I could 
> test it.  You have saved me the testing time, and I thank you. =)  My 
> mistake was in checking the code in before it was tested; hopefully 
> others will learn from my error. =)

Seriously, the new code looks nicer, so one might say that this wasn't
a optimization, just some cleanup, as we all do it sometimes.

But I just realized the new code has a bad problem the old one didn't,
when invalid UTF-8 sequences are scanned.

An Invalid UTF-8 sequence might for example be generated when a 
string is truncated but zero terminated, as it can happen when you use
strncpy, strlcpy, etc.

Consider the following (invalid UTF-8) sequence: "\xF0" (2 bytes, 
first is 0xF0, second 0x00) The old code will terminate at the second 
byte, as it is null, the new code will evaulate the first byte (indicates 
a 4 byte character), skip the next three, and continue, until it either
crashes, or finds a null byte.

Marcus



Other related posts: