Dang! Not just slower, but buggy, too! I've said it before and I'll say it again: it's really nice to be working with such a sharp bunch of folks. =) e >Erik Jaesler wrote: > >> In my defense, I can say only this: it occured to me that utf8_char_len >> might be less effecient than the simple loop already there because >> although you don't have to look at every single byte, the test in >> utf8_char_len is quite a bit more expensive. Owing to that suspicion, I >> left the original implementation in place, #if 0'd out, until I could >> test it. You have saved me the testing time, and I thank you. =) My >> mistake was in checking the code in before it was tested; hopefully >> others will learn from my error. =) > >Seriously, the new code looks nicer, so one might say that this wasn't >a optimization, just some cleanup, as we all do it sometimes. > >But I just realized the new code has a bad problem the old one didn't, >when invalid UTF-8 sequences are scanned. > >An Invalid UTF-8 sequence might for example be generated when a >string is truncated but zero terminated, as it can happen when you use >strncpy, strlcpy, etc. > >Consider the following (invalid UTF-8) sequence: "\xF0" (2 bytes, >first is 0xF0, second 0x00) The old code will terminate at the second >byte, as it is null, the new code will evaulate the first byte (indicates >a 4 byte character), skip the next three, and continue, until it either >crashes, or finds a null byte. > >Marcus > > Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves. -William Pitt, British prime-minister (1759-1806)