This is the problem. (Ironically, I used to fight against this with PHP, now it does it by itself.) I'm trying to peal off just 8 bits at a time from the string, but the way I'm doing it, it counts ellipsis as a char...*sigh* Still working on it. -- Travis D. Reed On Sun, Feb 7, 2010 at 1:27 PM, PulkoMandy <pulkomandy@xxxxxxxxx> wrote: > Le Sun, 07 Feb 2010 19:58:56 +0100, Travis D. Reed <tdreed@xxxxxxxxx> a > écrit: > > > I've figured out that my problems with fingerprints have to do with UTF-8 >> conversions. This probably explains weirdness in compiling my generated >> catalogs too. How should I represent é and ô and especially ellipsis, >> which >> doesn't even have an ASCII equivalent? >> -- >> Travis D. Reed >> > > There is no conversion to ASCII involved anywhere. The Locale Kit is > handling the strings as a stream of bytes, so é and ô will count as 2 bytes, > and ellipsis as 3 (0xE2 0x80 0xA6), in the utf-8 encoding. > http://www.fileformat.info/info/unicode/char/2026/index.htm gives infos > about the ellipsis. > > I have no idea how it is possible to split an unicode character as multiple > bytes in php ... > > > -- > Adrien Destugues / PulkoMandy > http://pulkomandy.ath.cx > >