On Fri, Apr 11, 2014 at 11:16 AM, Alain Meunier <deco33@xxxxxxxxxx> wrote: > But when accents come in the dance, there is a problem -> the famous > question mark. that's a unicode issue, regardless of the encoding. note that wchar_t was defined originally to hold UCS-2 characters (fixed 16 bit), but that was soon found to be incomplete and is now deprecated. most of win32 migrated to UTF-16, and later on added a few UTF-8 versions. Of course, there's nothing that one UTF can do and the other not. not even "easier to do", they're completely equivalent except that UTF-16 writes as many 0 bytes as ASCII characters in your text. > Are you all using icu's libraries ? no. I don't do any real text processing, so text mostly just flows binary-safe'ly on simple Lua strings. when i have to do some processing, i try to avoid all assumptions i can: most important is to only split when it's absolutely necessary and then only on whitespace, and you're safe on most cases. I know there are languages that don't use almost any whitespace, i just hope i won't have to do any splitting there. -- Javier