Given that maybe 0.05% of any plugins code is going to have to deal with character matching, string length counting, or substitution, I think any choice is fine. -----Original Message----- From: gmpi-bounce@xxxxxxxxxxxxx [mailto:gmpi-bounce@xxxxxxxxxxxxx] On Behalf Of thockin@xxxxxxxxxx Sent: Wednesday, December 14, 2005 2:44 PM To: gmpi@xxxxxxxxxxxxx Subject: [gmpi] Re: string encoding in teh API (UTFs) On Thu, Dec 15, 2005 at 07:30:12AM +1300, Jeff McClintock wrote: > > You're still ignoring all the other issues. > > Well I decided to use wide-chars (UCS-2) based on this article... > > "The Absolute Minimum Every Software Developer Absolutely, Positively > Must Know About Unicode and Character Sets (No Excuses!)" > > http://www.joelonsoftware.com/articles/Unicode.html > > It discusses most of the points you raise. It doesn't discuss that wchar_t is not actually guaranteed to be any specific width. It does address byte-order marks, but are we really going to suggest that is a useful thing to do on every string? It doesn't address the fact that Windows' implementation (UCS-2) can't represent all of unicode. It doesn't discuss the lack of any standards-based UTF-16 support. He equates UCS-2 and UTF-16 which is FLAT WRONG. > I guess you may not agree that UCS-2 (16 bit wide-char) is the way to > go. But I hope it at least explains my reasoning (better than I can). The best reasoning for it is "that's what Windows does". But if we want to support Unicode, UCS-2 doesn't cut it. If you want to convert from proper Unicode (whether that's UTF-8, UTF-16 or UTF-32) into UCS-2 in your host, you should absolutely feel free. But you do so at the peril of getting some characters wrong. I'm still leaning towards UTF-8 as the single standards-based option which completely covers Unicode. We have to pass strings between objects from different compilers as well as possibly across networks. We want to share sourcecode as well as translation databases between platforms. The only answer I see that meets all of those is UTF-8. I'm willing to be convinced otherwise, but th emore reading I do, the more I think UTF-8 is right. Tim ---------------------------------------------------------------------- Generalized Music Plugin Interface (GMPI) public discussion list Participation in this list is contingent upon your abiding by the following rules: Please stay on topic. You are responsible for your own words. Please respect your fellow subscribers. Please do not redistribute anyone else's words without their permission. Archive: //www.freelists.org/archives/gmpi Email gmpi-request@xxxxxxxxxxxxx w/ subject "unsubscribe" to unsubscribe ---------------------------------------------------------------------- Generalized Music Plugin Interface (GMPI) public discussion list Participation in this list is contingent upon your abiding by the following rules: Please stay on topic. You are responsible for your own words. Please respect your fellow subscribers. Please do not redistribute anyone else's words without their permission. Archive: //www.freelists.org/archives/gmpi Email gmpi-request@xxxxxxxxxxxxx w/ subject "unsubscribe" to unsubscribe