[haiku-i18n] Re: Language code for zh_hans Fwd: Catkeys update from HTA wanted?

  • From: Rimas Kudelis <rq@xxxxxx>
  • To: haiku-i18n@xxxxxxxxxxxxx
  • Date: Sun, 18 Dec 2011 11:20:12 +0200

2011.12.17 23:20, Adrien Destugues rašė:
Le 17/12/2011 09:39, Rimas Kudelis a écrit :
2011.12.17 10:11, Niels Sascha Reedijk rašė:
Hi,

On Fri, Dec 16, 2011 at 11:11 PM, Adrien Destugues
<pulkomandy@xxxxxxxxxxxxxxxxx>  wrote:
What would be more preferred. Use the dash? Or an underscore? Perhaps
we can use the dash in this case because there is no relation between
the zh-Hans and zh.

  And capitalization, should we use the ISO version? so zh-Hans? Or
shall we capitalize it completely, like pt_BR?


Don't mix them. pt_BR is portuguese, variant spoken in brazil.
zh-Hans and zh-Hant are, for localization purposes, entirely different
langages. (they read the same, but use different alphabets). Another example of that would be the two writings of norwegian - nynorsk and bokmal (I hope
I get the spelling right...).

The result is that there is no 'zh' language at all. It's either zh-Hans or zh-Hant. Go with ISO, and that should be fine as it's also what ICU uses ?
ICU seems to use the normal dash even for language-country locales,
like pt-BR [1]. So in this sense it might have to do with the demands
that POSIX makes, but this really goes into foreign territory. KDE
uses zh_CN for simplified Chinese. I understand the difference and why
zh-Hans is better, but i am wondering whether we should keep
compatibility in mind.

Unless of course, the LC_ALL and Haiku locale is unrelated.

Regards,

N>


[1] http://www.iana.org/assignments/language-subtag-registry

Currently, the user chooses his language in one tab in Locale prefs, and country in the other. I think LANG and all the LC_* variables should be composed (and I believe they are) by joining these two preferences, that is: * if I choose English language and China region, the locale would be en-CN (or en_CN) * if I choose Chinese Simplified langauge and China region, the locale would be zh-Hans-CN (or with underscores) * if I choose Chinese Simplified language and do not specify the region, the locale would be zh-Hans

So, CN (or TW) is quite likely to appear in those variables anyway, and since Haiku is an OS on its own, I'm not sure it makes sense to require that backwards compatibility. Linux may also move on from their current scheme one day too, you never know...

WRT dash vs. underscore... I'd think if BCP47 specifies and ICU uses dashes, and we don't have backwards compatibility to stick to, then why not use dashes? I suspect that for command-line applications ported from say Linux, simply renaming their .po files to our scheme would do (haven't tested though), and even if not, it should probably be considered a bug in GNU Gettext, not with us then...

Rimas

LC_ALLis unrelated. The code we're talking about here is the one for catalogs, which does not use it.

Yeah, but I assume it's the Locale kit that sets those variables that can later be used by e.g. KDE apps. That's all I was saying.

You could add a country specific code to zh-Hans, giving something like zh-Hans_CN for China. That is helpful if there is some other country using a variant of the language, same as pt_BR is different from pt_PT and fr_CA is not exactly like fr_FR.

The implementation allows "pt_BR" to fallback to "pt" when there is no pt_BR string available.

Note that this language+country is chosen in the "language" tab, and is unrelated to the country code in the formatting tab. It is not possible to make up arbitrary codes such as en_FR (as english is not a language usually spoken in France) for the language selection.

Ah, I didn't know that. Thanks.

The thing to remember is :
* _ is the marker for fallback. zh-Hans_CN can thus fallback to zh-Hans, but not to zh nor zh_CN or whatever else.
 * - is a normal character, used as a separator in some language codes.

Hm, you said if the Locale Kit doesn't do something that it should, then it should be fixed. I would guess that treating dash as a fallback marker could be one of those things, if what we want is to be closer to BCP47 and ICU. Unless there's a really big reason not to.

And regarding Chinese in particular, I don't think there's anything to be afraid of with the fallback mechanism – since no catalogs for zh without modifiers will exist, the fallback mechanism should simply fall back to the next language in the list, e.g. English (and it's a bug if it does not). So I don't think there's much argumentation for treating script modifiers differently from language modifiers.

By the way, I just looked a bit closer at locale preferences, and I'm a bit surprised that for each language that can be typed in multiple scripts, multiple entries exist (one for the language itself and one for that language written in each script). For example, for Chinese, there are Chinese, Chinese Simplified and Chinese Traditional entries. I don't think this makes sense, does it? I would suggest the following scheme:

* for each language with a Suppress-Script attribute in BCP47 (such as Punjabi), don't provide an option for "That language (That script)", leaving only "That language" instead, and listing countries under it * for each language without a Suppress-Script attribute (such as Chinese), don't provide an option for "That language" without a script modifier at all, leaving only the options with the script modifier set

This would quickly get rid of a few useless options in our Locale preflet, which can only be good, IMO.

Rimas

Other related posts: