Ok, thanks for your clarifications Javier :) I will keep going the icu route because i need text processing :) See you > Date: Fri, 11 Apr 2014 11:32:04 -0500 > Subject: Re: ffi : use of wchar_t, size_t > From: javier@xxxxxxxxxxx > To: luajit@xxxxxxxxxxxxx > > On Fri, Apr 11, 2014 at 11:16 AM, Alain Meunier <deco33@xxxxxxxxxx> wrote: > > But when accents come in the dance, there is a problem -> the famous > > question mark. > > that's a unicode issue, regardless of the encoding. > > note that wchar_t was defined originally to hold UCS-2 characters > (fixed 16 bit), but that was soon found to be incomplete and is now > deprecated. most of win32 migrated to UTF-16, and later on added a > few UTF-8 versions. Of course, there's nothing that one UTF can do > and the other not. not even "easier to do", they're completely > equivalent except that UTF-16 writes as many 0 bytes as ASCII > characters in your text. > > > > Are you all using icu's libraries ? > > no. I don't do any real text processing, so text mostly just flows > binary-safe'ly on simple Lua strings. when i have to do some > processing, i try to avoid all assumptions i can: most important is to > only split when it's absolutely necessary and then only on whitespace, > and you're safe on most cases. I know there are languages that don't > use almost any whitespace, i just hope i won't have to do any > splitting there. > > > -- > Javier >