[gmpi] Re: string encoding in teh API (UTFs)

  • From: thockin@xxxxxxxxxx
  • To: gmpi@xxxxxxxxxxxxx
  • Date: Wed, 14 Dec 2005 11:24:59 -0800

On Wed, Dec 14, 2005 at 06:53:03PM +0100, Sebastien Metrot wrote:
> thockin@xxxxxxxxxx wrote:
> >
> >It seems to me that UTF-16 has all the problems of UTF-8 and more, with
> >none of the advantages.  The *only* thing it has going for it is Windows
> >and Mac.  That shouldn't be ignored, but neither should the accompanying
> >drawbacks.
> >  
> 
> UTF-16 is not a standard Windows feature. Win32 uses UCS-2.


Great, so Windows doesn't even support Unicode fully.

> UTF-8 is the 
> easiest way to go if you are not doing a word processor. strlen DOES 
> work in the sense that it counts bytes, not glyphs so most program can 
> work out of the box with  UTF-8. You only need special handling if you 
> are actually trying to interpret the text data (separate words, isolate 
> glyphs, hyphenate, etc...). Most people don't have those needs and the 
> ones that have the needs will learn how to do it properly which is not 
> nearly as complicated as you make it sound :-).

I'm trying to make a fair opinion, but it really seems to me that UTF-8 is
the standards-supported option for making data cross compiler boundaries.

We're not doing massive text passing or manipulation.  We want source-code
compatibility as much as we possibly can.  UTF-16 or UCS-2 has *zero*
standards support for even simple things like printf.  wchar_t *could*
work and does have standards support, except that it can vary between
compilers.

These things add up to make it very hard for a plugin to be
source-compatible across platforms, including translation files, unless we
use something that *is* well-defined.  That's UTF-8.

I don't expect to be doing much string manipulation outside of printing
them to various displays (GUI and non-GUI alike).  I think it is fairly
easy to define that the encoding of a string which it crosses the
plugin<->host boundary (in either direction) is UTF-8.  The receiver can
always turn that to whatever native encoding it wants to use.

You can still use asian characters in control names.  You can fully
represent all of Unicode.  You are ASCII compatible, standards compliant,
and cross-compiler safe.

That said, I'll go read the article Jeff posted.

----------------------------------------------------------------------
Generalized Music Plugin Interface (GMPI) public discussion list
Participation in this list is contingent upon your abiding by the
following rules:  Please stay on topic.  You are responsible for your own
words.  Please respect your fellow subscribers.  Please do not
redistribute anyone else's words without their permission.

Archive: //www.freelists.org/archives/gmpi
Email gmpi-request@xxxxxxxxxxxxx w/ subject "unsubscribe" to unsubscribe

Other related posts: