Re: [nvda-translations] [Nvda-dev-asia] Character descriptions: characters composed of multiple components (specifically, Hangul char descriptions)

From: James Teh <jamie@xxxxxxxxxxxx>
To: NVDA development for Asian character input <nvda-dev-asia@xxxxxxxxxxxxxxxxxx>, nvda-translations@xxxxxxxxxxxxx
Date: Fri, 09 Nov 2012 14:32:09 +1000

Hi Joseph,

I think there are several other languages which have compound charactersas well, such as Tamil. Normally, a single character is only representedby one Unicode character. However, in Tamil, a compound character, eventhough it only looks like one character visually, is actuallyrepresented by multiple Unicode characters. I'm not sure if this is howUnicode handles all such languages.

As a sidenote, this actually causes a problem concerning speak typedcharacters for Tamil users:

http://www.nvda-project.org/ticket/1428

Strangely, your first Korean example (ㄱㅏ, ga) is two Unicodecharacters, but the second (관, gwan) is only one. So, for the first,doing as you request isn't too difficult. For the second, we have todecompose the character. It looks like we can do this in Python fairlyeasily (unicodedata.normalize with form NFD). So, if I'm correct, 관becomes 관.

The question is whether this is desirable for other languages. Also,this will affect European languages as well; e.g. Ç (C cedilla)decomposes to Ç (C followed by combining cedilla). At a guess, I'd thinkit's not desirable for some languages. The problem is that it'd bedifficult for NVDA to know which characters to do it for and which not.I guess we could make it a config option, but that kinda sucks for newKorean users.

The other question is what to do when a user presses speak current word(numpad5) thrice, which spells the word with character descriptions. Dowe split the compound characters there as well?

I would file a ticket for this, at least for Korean. We can thendetermine from this email thread whether other languages will benefit.


Jamie

On 8/11/2012 4:54 PM, Joseph Lee wrote:

Hi folks,

I’m copying both the translations and dev_asia group to get your
feedback on the following:

Are there languages besides Korean that requires multiple char
components when constructing a single char? At least in Korean, there
are character components that goes into creating a single character (not
a word). In Korean, a single character consists of initial conscenant,
one or two vowels and zero or more final conscenants. For example, the
character “ga” (written as ㄱㅏin Korean) has an initial conscenant (G,
pronounced “gi-yug) and a vowel (ah). Or, the character “gwan” (written
as 관 in Korean, meaning a crown) has the initial conscenant of “G”, the
vowel “wa” and the final conscenant of “n” (pronounced “ni-eun”).

As of 2012.3, when invoking char description script (numpad2 twice
quickly when the review cursor is focused on the character), the char
itself is announced again (when the char in question is a Hangul
character). The ideal behavior (requested by Korean users) is to
announce the components of such a character when this script is
executed. For example, supposing that the char is “ga”:

User puts review cursor on the char “ga”. Then he or she does the following:

Current behavior under 2012.3:

·Presses numpad2: NVDA says “ga”.

·Presses Numpad2 for the second time: NvDA says “ga”.

Ideal behavior (investigating for and researching this with Korean users
for 2013.1):

·Presses Numpad2: NvDA says “ga”.

·Presses Numpad2 for the second time: NvDA says “gi-yuk, ah”.

A naïve solution would be to map all possible 10773
conscenant/vowel/conscenant set combinations in
characterDescriptions.dic, which has a risk of slower performance. A
fellow Korean translator says he found a Python script which could
calculate components of a Korean char. I feel that if this is unique to
Korean, then it’s something that we Korean users can work on it
ourselves; however, if there are other languages that uses this kind of
component system for constructing a char, that could give us some test
scenarios for improving char description module in the future to take
this case into account.

If you want, I’ll create a ticket for this case later this month. Thanks.

//JL



_______________________________________________
Nvda-dev-asia mailing list
Nvda-dev-asia@xxxxxxxxxxxxxxxxxx
http://lists.nvaccess.org/listinfo/nvda-dev-asia


--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372

Follow-Ups:
- Re: [nvda-translations] [Nvda-dev-asia] Character descriptions: characters composed of multiple components (specifically, Hangul char descriptions)
  - From: Mesar Hameed

References:
- [nvda-translations] Character descriptions: characters composed of multiple components (specifically, Hangul char descriptions)
  - From: Joseph Lee

Re: [nvda-translations] [Nvda-dev-asia] Character descriptions: characters composed of multiple components (specifically, Hangul char descriptions)

Other related posts: