Re: [nvda-translations] [Nvda-dev-asia] Character descriptions: characters composed of multiple components (specifically, Hangul char descriptions)

  • From: James Teh <jamie@xxxxxxxxxxxx>
  • To: NVDA development for Asian character input <nvda-dev-asia@xxxxxxxxxxxxxxxxxx>, nvda-translations@xxxxxxxxxxxxx
  • Date: Fri, 09 Nov 2012 14:32:09 +1000

Hi Joseph,

I think there are several other languages which have compound characters as well, such as Tamil. Normally, a single character is only represented by one Unicode character. However, in Tamil, a compound character, even though it only looks like one character visually, is actually represented by multiple Unicode characters. I'm not sure if this is how Unicode handles all such languages.

As a sidenote, this actually causes a problem concerning speak typed characters for Tamil users:
http://www.nvda-project.org/ticket/1428

Strangely, your first Korean example (ㄱㅏ, ga) is two Unicode characters, but the second (관, gwan) is only one. So, for the first, doing as you request isn't too difficult. For the second, we have to decompose the character. It looks like we can do this in Python fairly easily (unicodedata.normalize with form NFD). So, if I'm correct, 관 becomes 관.

The question is whether this is desirable for other languages. Also, this will affect European languages as well; e.g. Ç (C cedilla) decomposes to Ç (C followed by combining cedilla). At a guess, I'd think it's not desirable for some languages. The problem is that it'd be difficult for NVDA to know which characters to do it for and which not. I guess we could make it a config option, but that kinda sucks for new Korean users.

The other question is what to do when a user presses speak current word (numpad5) thrice, which spells the word with character descriptions. Do we split the compound characters there as well?

I would file a ticket for this, at least for Korean. We can then determine from this email thread whether other languages will benefit.

Jamie

On 8/11/2012 4:54 PM, Joseph Lee wrote:
Hi folks,

I’m copying both the translations and dev_asia group to get your
feedback on the following:

Are there languages besides Korean that requires multiple char
components when constructing a single char? At least in Korean, there
are character components that goes into creating a single character (not
a word). In Korean, a single character consists of initial conscenant,
one or two vowels and zero or more final conscenants. For example, the
character “ga” (written as ㄱㅏin Korean) has an initial conscenant (G,
pronounced “gi-yug) and a vowel (ah). Or, the character “gwan” (written
as 관 in Korean, meaning a crown) has the initial conscenant of “G”, the
vowel “wa” and the final conscenant of “n” (pronounced “ni-eun”).

As of 2012.3, when invoking char description script (numpad2 twice
quickly when the review cursor is focused on the character), the char
itself is announced again (when the char in question is a Hangul
character). The ideal behavior (requested by Korean users) is to
announce the components of such a character when this script is
executed. For example, supposing that the char is “ga”:

User puts review cursor on the char “ga”. Then he or she does the following:

Current behavior under 2012.3:

·Presses numpad2: NVDA says “ga”.

·Presses Numpad2 for the second time: NvDA says “ga”.

Ideal behavior (investigating for and researching this with Korean users
for 2013.1):

·Presses Numpad2: NvDA says “ga”.

·Presses Numpad2 for the second time: NvDA says “gi-yuk, ah”.

A naïve solution would be to map all possible 10773
conscenant/vowel/conscenant set combinations in
characterDescriptions.dic, which has a risk of slower performance. A
fellow Korean translator says he found a Python script which could
calculate components of a Korean char. I feel that if this is unique to
Korean, then it’s something that we Korean users can work on it
ourselves; however, if there are other languages that uses this kind of
component system for constructing a char, that could give us some test
scenarios for improving char description module in the future to take
this case into account.

If you want, I’ll create a ticket for this case later this month. Thanks.

//JL



_______________________________________________
Nvda-dev-asia mailing list
Nvda-dev-asia@xxxxxxxxxxxxxxxxxx
http://lists.nvaccess.org/listinfo/nvda-dev-asia


--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372

Other related posts: