Re: ffi : use of wchar_t, size_t

From: Javier Guerra Giraldez <javier@xxxxxxxxxxx>
To: LuaJIT <luajit@xxxxxxxxxxxxx>
Date: Fri, 11 Apr 2014 11:32:04 -0500

On Fri, Apr 11, 2014 at 11:16 AM, Alain Meunier <deco33@xxxxxxxxxx> wrote:
> But when accents come in the dance,  there is a problem -> the famous
> question mark.

that's a unicode issue, regardless of the encoding.

note that wchar_t was defined originally to hold UCS-2 characters
(fixed 16 bit), but that was soon found to be incomplete and is now
deprecated.  most of win32 migrated to UTF-16, and later on added a
few UTF-8 versions.  Of course, there's nothing that one UTF can do
and the other not.  not even "easier to do", they're completely
equivalent except that UTF-16 writes as many 0 bytes as ASCII
characters in your text.

> Are you all using icu's libraries ?

no.  I don't do any real text processing, so text mostly just flows
binary-safe'ly on simple Lua strings.  when i have to do some
processing, i try to avoid all assumptions i can: most important is to
only split when it's absolutely necessary and then only on whitespace,
and you're safe on most cases.  I know there are languages that don't
use almost any whitespace, i just hope i won't have to do any
splitting there.

-- 
Javier

Follow-Ups:
- RE: ffi : use of wchar_t, size_t
  - From: Alain Meunier

References:
- ffi : use of wchar_t, size_t
  - From: Alain Meunier
- Re: ffi : use of wchar_t, size_t
  - From: Justin Cormack
- RE: ffi : use of wchar_t, size_t
  - From: Alain Meunier
- Re: ffi : use of wchar_t, size_t
  - From: Javier Guerra Giraldez
- RE: ffi : use of wchar_t, size_t
  - From: Alain Meunier

Re: ffi : use of wchar_t, size_t

Other related posts: