On 2010-11-02 at 12:04:57 [+0100], Adrien Destugues <pulkomandy@xxxxxxxxxxxxxxxxx> wrote: > >> Known problems : > > [...] > >> * The way I altered LocaleRoster is not as good as it could be. Locales > >> with variants such as zh-hans are not going to work (the computed > >> country > >> code is now 'NS'). Not sure there is a better way > > > > Is NS even a valid country code? What exactly is the problem? One would > > think > > it should always be possible to infer some reasonably fitting country from > > language, no? > > I just took the last two characters of the country code, and this > doesn't work when there is a variant (such as -hans, which means > "simplified han"). > I'm not sure what can be done to decide zh-hans should actually be 'cn' > (for china). How about looking at the full second component? If it is a country code, fine, otherwise use the first component. > There are similar problems with english (the language is 'en' but the > country should be 'US'). Some other countries happen to have the same > code for language and country, and in these cases, the current code > works. (fr_FR, de_DE, ...). Isn't that simply a problem of the translations being labeled incorrectly, respectively of the countryless identifiers not having a well-defined meaning? E.g. since it is extremely unlikely that an English translation perfectly fits all local variants, it probably shouldn't be labeled "en", but "en_GB", "en_US", or whatever applies. Or the other way around, we could just define that "en" refers to some specific variant, which would give it a meaningful semantics (instead of the not really helpful "some kind of English"). > I expect we'll get more 'specialized' translations (such as pt_BR) in > the future, so the problem may solve itself in most cases (or we could > just force every translation to be linked to a country). Yep, the country (or more generally: variant) association makes sense. > zh-hans is a particular case, this name is used because several > countries use the same language and don't want one of them to be used in > the language name. I believe it's actually the other way around: Both simplified and traditional script are used in China (mainland vs. Hongkong and Taiwan), so adding the country wouldn't help. > I don't think there is a language > country mapping in ICU. We can get a > list of all locales and find one that match the language, but then > 'french' may get the belgian flag because BE happens to be first in > alphabetical order. Not sure that would make sense. As suggested above, we could use a hard-coded language -> country mapping for the first component, but use the second component, if it identifies a country. CU, Ingo