[liblouis-liblouisxml] UEB: Latin accent placement, current implementation and class/swap or context/multipass-based implementation ideas

From: "Joseph Lee" <joseph.lee22590@xxxxxxxxx>
To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
Date: Sat, 2 Aug 2014 06:31:17 -0700
Hi all, mostly UEB users and table maintainers:

While Ken is working on translation issues with UEB, I'm working on adding
new symbols to our UEB table package. As part of this work, I'm adding
accents for Latin characters based on official specifications (rule section
4.2). While defining the characters and braille representation is easy, I'd
like to get your comments and answers on the following points:

1.       Character placement: I put accents in character definitions file.
However, I propose that they be merged into grade 1 file like Greek letters
(and Christian and Bert, I'll send a pull request from my Github repo so you
can review current implementation before merging into master).

2.       Dot patterns: As with Greek letters, UEB specifies two braille dot
components to represent Latin Unicode characters: modifier type followed by
the letter being modified. For example, for A acute, it is represented as
dots 45 then 34, followed by an A (dot 1). The first two cells are modifier
identifier and may or may not be proceeded by dot 6 if it's an uppercase
letter that's being modified. For now, the "uplow" opcode works (just like
Greek letters), with actual characters themselves being included in the
definition file.

3.       Concerns with defining whole Unicode range dot patterns for Latin
characters: the current strategy (uplow) is ideal for a small fraction of
Latin characters, mostly ones which we encounter every day such as a acute
and c cedilla. UEB provides provisions for any Latin supplement Unicode
character to be mapped to a braille pattern using the format that was
described above. As there are at least 20 or so letters which could be
modified and with multiple modifier forms (at least 6, with the common ones
being grave accent, acute, cedilla, tilde, umlaut and stroke), it quickly
becomes impossible to manage this using a dictionary approach (as used in
UEB for the most part). For now, I've mapped the first 64 characters from
Unicode ranges 0x00c0 to 0x00ff (at least the common ones). In order to
fully support more Unicode characters, a more systematic approach which
allows easier expansion is needed.

4.       Proposal to use class or swap opcodes to define Latin characters:
one way to overcome the above problem is using classes and swap opcodes in
conjunction with context/multipass facility. This could be done in four
phases:

*         Packaging characters into classes: Since Latin letters can be
grouped in different ways (by modifier type, by letter, etc.), it might be
easier to package them as classes. This allows a table maintainer to extend
support for more characters by extending the character classes.

*         Swap definition: We can try using "swapcc/swapcd" opcode to tell
LibLouis that these accents should be displayed just like any other Latin
letter (A acute would be displayed as just the letter A).

*         Context and multipass conditions: since context and pass2/3/4
opcodes allow a table maintainer to refer to classes, we can invoke them and
tell LibLouis to insert the needed modifier dot pattern before each
character in the class. For example:

Context %LowerAcute 45-34-LettersFromLowercaseAcuteClass

*         Test and extend: If the above strategy works, we can test and
extend other Latin characters by including them in appropriate classes. In
the future, all the table maintainer needs to do is extend the character
class or create new class/swap/context/multipass set, thereby easing the
maintenance of the Latin accent character group in the UEB (as UEB is an
involving standard, who knows if ICEB (International Council on English
Braille) may add more characters).

Of these four points, I think the highest priority should be given to
character placement (which table should contain these characters). As of
now, using uplow to define common Latin characters would be sufficient, but
let's prepare for a day when we need to add support for additional
characters with future extension in mind.

Comments, suggestions or questions are appreciated. Thanks.

Cheers,

Joseph
[liblouis-liblouisxml] UEB: Latin accent placement, current implementation and class/swap or context/multipass-based implementation ideas

Other related posts: