RE: ffi : use of wchar_t, size_t

  • From: Alain Meunier <deco33@xxxxxxxxxx>
  • To: "luajit@xxxxxxxxxxxxx" <luajit@xxxxxxxxxxxxx>
  • Date: Fri, 11 Apr 2014 19:30:55 +0200

Ok, thanks for your clarifications Javier :)

I will keep going the icu route because i need text processing :)

See you

> Date: Fri, 11 Apr 2014 11:32:04 -0500
> Subject: Re: ffi : use of wchar_t, size_t
> From: javier@xxxxxxxxxxx
> To: luajit@xxxxxxxxxxxxx
> 
> On Fri, Apr 11, 2014 at 11:16 AM, Alain Meunier <deco33@xxxxxxxxxx> wrote:
> > But when accents come in the dance,  there is a problem -> the famous
> > question mark.
> 
> that's a unicode issue, regardless of the encoding.
> 
> note that wchar_t was defined originally to hold UCS-2 characters
> (fixed 16 bit), but that was soon found to be incomplete and is now
> deprecated.  most of win32 migrated to UTF-16, and later on added a
> few UTF-8 versions.  Of course, there's nothing that one UTF can do
> and the other not.  not even "easier to do", they're completely
> equivalent except that UTF-16 writes as many 0 bytes as ASCII
> characters in your text.
> 
> 
> > Are you all using icu's libraries ?
> 
> no.  I don't do any real text processing, so text mostly just flows
> binary-safe'ly on simple Lua strings.  when i have to do some
> processing, i try to avoid all assumptions i can: most important is to
> only split when it's absolutely necessary and then only on whitespace,
> and you're safe on most cases.  I know there are languages that don't
> use almost any whitespace, i just hope i won't have to do any
> splitting there.
> 
> 
> -- 
> Javier
> 
                                          

Other related posts: