[gmpi] Re: string encoding in teh API (UTFs)

From: "Ron Kuper" <RonKuper@xxxxxxxxxxxx>
To: <gmpi@xxxxxxxxxxxxx>
Date: Wed, 14 Dec 2005 14:53:36 -0500
Given that maybe 0.05% of any plugins code is going to have to deal with
character matching, string length counting, or substitution, I think any
choice is fine.   

-----Original Message-----
From: gmpi-bounce@xxxxxxxxxxxxx [mailto:gmpi-bounce@xxxxxxxxxxxxx] On
Behalf Of thockin@xxxxxxxxxx
Sent: Wednesday, December 14, 2005 2:44 PM
To: gmpi@xxxxxxxxxxxxx
Subject: [gmpi] Re: string encoding in teh API (UTFs)

On Thu, Dec 15, 2005 at 07:30:12AM +1300, Jeff McClintock wrote:
> > You're still ignoring all the other issues.
> 
> Well I decided to use wide-chars (UCS-2) based on this article...
> 
> "The Absolute Minimum Every Software Developer Absolutely, Positively 
> Must Know About Unicode and Character Sets (No Excuses!)"
> 
> http://www.joelonsoftware.com/articles/Unicode.html
> 
> It discusses most of the points you raise.

It doesn't discuss that wchar_t is not actually guaranteed to be any
specific width.

It does address byte-order marks, but are we really going to suggest
that
is a useful thing to do on every string?

It doesn't address the fact that Windows' implementation (UCS-2) can't
represent all of unicode.

It doesn't discuss the lack of any standards-based UTF-16 support.

He equates UCS-2 and UTF-16 which is FLAT WRONG.

> I guess you may not agree that UCS-2 (16 bit wide-char) is the way to 
> go. But I hope it at least explains my reasoning (better than I can).

The best reasoning for it is "that's what Windows does".  But if we want
to support Unicode, UCS-2 doesn't cut it.  If you want to convert from
proper Unicode (whether that's UTF-8, UTF-16 or UTF-32) into UCS-2 in
your
host, you should absolutely feel free.  But you do so at the peril of
getting some characters wrong.

I'm still leaning towards UTF-8 as the single standards-based option
which
completely covers Unicode.  We have to pass strings between objects from
different compilers as well as possibly across networks.  We want to
share
sourcecode as well as translation databases between platforms.  The only
answer I see that meets all of those is UTF-8.

I'm willing to be convinced otherwise, but th emore reading I do, the
more
I think UTF-8 is right.

Tim

----------------------------------------------------------------------
Generalized Music Plugin Interface (GMPI) public discussion list
Participation in this list is contingent upon your abiding by the
following rules:  Please stay on topic.  You are responsible for your
own
words.  Please respect your fellow subscribers.  Please do not
redistribute anyone else's words without their permission.

Archive: //www.freelists.org/archives/gmpi
Email gmpi-request@xxxxxxxxxxxxx w/ subject "unsubscribe" to unsubscribe


----------------------------------------------------------------------
Generalized Music Plugin Interface (GMPI) public discussion list
Participation in this list is contingent upon your abiding by the
following rules:  Please stay on topic.  You are responsible for your own
words.  Please respect your fellow subscribers.  Please do not
redistribute anyone else's words without their permission.

Archive: //www.freelists.org/archives/gmpi
Email gmpi-request@xxxxxxxxxxxxx w/ subject "unsubscribe" to unsubscribe
[gmpi] Re: string encoding in teh API (UTFs)

Other related posts: