Re: [nvda-translations] [Nvda-dev-asia] Character descriptions:characters composed of multiple components (specifically, Hangul chardescriptions)

  • From: Takuya Nishimoto <nishimotz@xxxxxxxxx>
  • To: NVDA development for Asian character input <nvda-dev-asia@xxxxxxxxxxxxxxxxxx>
  • Date: Sat, 10 Nov 2012 08:30:11 +0800

Dear all,

Japanese language does not use compound characters, however, there are
some demands of enhancing character description.
This is one of the reason that we are still developing Japanese version of NVDA.

Latest beta of NVDAJP version 2012.3jp
http://en.sourceforge.jp/projects/nvdajp/releases/57233
Release notes
http://www.nvda.jp/nvda2012.3jp/en/readmejp.html

Firstly, separation of word is not trvial in Japanese language.
Its implementation is not consistent. Some applications rely on
Windows API and some applications detect word breaking by themselves.
This is because Numpad-5 (word review) operations are not frequently
used for Japanese language.
Japanese users are not interested in difference of character-level
description and word-level description, so we basically use only one
description for every ideographic character.

The main issue is that character description of Japanese character
should be in different way depending on purposes.

For character review, 'spelling functionality' of TTS is currently
used, however, this function is not supported by popular Japanese TTS,
and also the pronunciations of single Japanese character by TTS is
sometimes invalid.
So we need dictionary for 'spelling reading' of character, in addition
to character descriptions.

Japanese characters includes ideographic (Chinese, or Kanji)
characters and phonetic (syllabic, or Kana) characters.
Description for phonetic character is sometimes too verbose, for
example, as the announcements of input method candidates.
Description for phonetic character is not necessary to identify the
character, if the TTS can pronounce them correctly.
The category of characters should be distinguished and descriptions
should be selected properly.

Description of multiple characters, such as describing characters
within words, we sometimes use another reduction of explanations.
For example, if two ideographic characters of same category (such as
katakana) are included in input composition candidate,
'katakana a, katakana i, katakana u' should be shortened as 'katakana
a, i, u' to avoid verbosity.
Similarly, full shape character and half shape character should be announced.

To identify which character is phonetic and which character belongs to
certain category of characters, we need some algorithm based on
character code, or we should use some dictionary with attributes.

--
Takuya Nishimoto


On Fri, Nov 9, 2012 at 5:16 PM, Joseph Lee <joseph.lee22590@xxxxxxxxx> wrote:
> Hi Jamie and Mesar,
> Sure - I'll file a ticket on this one on handling compound characters, which
> would be useful for proofreading words and chars in some languages where
> compound characters are used.  Thanks.
> Joseph
>
>
> ----- Original Message -----
> From: Mesar Hameed <mesar.hameed@xxxxxxxxx
> To: nvda-translations@xxxxxxxxxxxxx
> Date sent: Fri, 9 Nov 2012 10:09:38 +0100
> Subject: Re: [nvda-translations] [Nvda-dev-asia] Character
> descriptions:characters composed of multiple components (specifically,
> Hangul chardescriptions)
>
> Hi,
>
> On Fri 09/11/12,14:32, James Teh wrote:
> Hi Joseph,
>
> I think there are several other languages which have compound
> characters as well, such as Tamil.  Normally, a single character is
> only represented by one Unicode character.  However, in Tamil, a
> compound character, even though it only looks like one character
> visually, is actually represented by multiple Unicode characters.
> I'm not sure if this is how Unicode handles all such languages.
>
> Confirmed with indian languages, arabic and possibly Hebrew.
>
> The other question is what to do when a user presses speak current
> word (numpad5) thrice, which spells the word with character
> descriptions.  Do we split the compound characters there as well?
>
> I think splitting in that case would be good too.
>
> I would file a ticket for this, at least for Korean.  We can then
> determine from this email thread whether other languages will
> benefit.
>
> Jamie
>
> -- Mesar
>
>
> _______________________________________________
> Nvda-dev-asia mailing list
> Nvda-dev-asia@xxxxxxxxxxxxxxxxxx
> http://lists.nvaccess.org/listinfo/nvda-dev-asia

Other related posts: