[haiku-development] Changes from the japanese community

As you (at least, the admins) know, I contacted momoziro (the guy
behind the JPBE haiku live CD) about the changes he made in order to
better support the japanese language.
Here are the changes (I posted here because I think this mailing list
is the better suited to discuss these kind of things):

1. He pointed me to
http://www2d.biglobe.ne.jp/~msyk/software/libiconv-patch.html for the
patch to libiconv (beware, I still don't know why this is needed)

2. He made this change to character_sets.cpp in libtextencoding

-static const BCharacterSet shiftJIS(12,17,"Japanese Shift
JIS","Shift_JIS","Shift_JIS",shiftJISaliases);
+static const BCharacterSet shiftJIS(12,17,"Japanese Shift
JIS","CP932","Shift_JIS",shiftJISaliases);

(I added the patch also as attachment).

To understand the reason behind this change, Marc Flerackers pointed
me to this article on wikipedia:
http://en.wikipedia.org/wiki/Shift-jis

extract:
"Many different versions of Shift JIS exist. There are two areas for
expansion: Firstly, JIS X 0208 does not fill the whole 94x94 space
encoded for it in Shift JIS, therefore there is room for more
characters here—these are really extensions to JIS X 0208 rather than
to Shift JIS itself. The most popular extension here is to the
Windows-31J (otherwise known as Code page 932) encoding popularized by
Microsoft."

Basically this change is needed to be able to read japanese files
written on a MS system.
Marc also said CP932 is backwards compatible, as it's just an extension.

3. He did this to unzip (momoziro's own words):

"he revision part of the unzip only erased "OEM_INTERN((string))" in
"Ext_ASCII_TO_Native" macro of trunk/src/bin/unzip/unzpriv.h.
The file name broke when the file name (Shift_JIS encoded multi-byte
Japanese language characters) was mis-recognized to iso8859-1.
I did the change that invalidate the character-code conversion.
I think that this is not a good method."

So I don't think we should apply this last change. But maybe there is
a better way to do this.
Index: character_sets.cpp
===================================================================
--- character_sets.cpp  (revision 21227)
+++ character_sets.cpp  (working copy)
@@ -129,7 +129,7 @@
        "shift_jisx0213",
        NULL
 };
-static const BCharacterSet shiftJIS(12,17,"Japanese Shift 
JIS","Shift_JIS","Shift_JIS",shiftJISaliases);
+static const BCharacterSet shiftJIS(12,17,"Japanese Shift 
JIS","CP932","Shift_JIS",shiftJISaliases);
 
 static const char * EUCPackedJapaneseAliases[] = {
        // IANA aliases

Other related posts: