[liblouis-liblouisxml] Re: Capital/Emphasis update

From: "Susan Jolly" <easjolly@xxxxxxxxxxxxx>
To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
Date: Mon, 23 Feb 2015 18:55:39 -0700

I've been aware of the complexities of the UEB capitalization approach aswell as the related complexities of identifying any passage which requiressome sort of braille markup for quite some time. I'm not at all suprisedthat addressing all this requires a lot of work and I've been impressed bythe level of the recent discussion.

In my experience as a former code developer I typically went back and forthbetween top-level or conceptual issues and bottom-level or implementationissues numerous times before I fully understood what needed to be done andhow to accomplish it. Since there's been quite a bit of bottom-leveldiscussion on the subject concern, I thought it might be helpful to reviewsome of the conceptual issues. I don't mean to imply that you haven'talready thought of these issues but wanted to explain why I think it isimportant to keep the big picture, including the rules for the numericindicator, in mind when working with UEB.

A standard transcribing problem is the use of special typeforms such asitalics. Here one can to some extent rely on the XML markup in the printsource file. Nonethless there are still braille issues that needconsideration.

One issue is how a passage is defined. For example in EBAE an italicssequence consists of more that one passage if its sub-sequences areitalicized for different reasons. Thus each title in a comma-separated listof italicized book titles would have its own italics indicator(s). Theintent of this rule is to make it easier for the reader.

A related issue is whether a needed termination indicator is placed beforeor after the last item in the passage. As a sighted person I can look aheadto see where either a rendered or marked-up italicized passage ends but Ican understand why a braille reader might find it more informative to havethe last word identified before rather than after it.

In addition to typeform indicators there are markup issues unique tobraille.

For example, capital letters require markup in braille but not in print. Onecould add XML tags to mark the presence of capital letters in the source.(I don't know much about XSL but it might be a useful way to add such tags.)Of course, some thought needs to be given as to how to mark a capitalizedpassage as opposed to a capitalized word and to what extent the details ofthe markup should depend on the specifics of the targetted braille system.

Also, as has been pointed out on the list, sometimes characters other thancapital letters may be considered as capitals since they don't "break" thescope of the passage capitalization indicators. How should this beaddressed?

Unicode associates what it calls "character properties" to each character.Examples of properties include Lu for uppercase letters and Nd for decimaldigits. I believe that modern Unicode-aware programming languages canprovide these character properties.

Using the above idea from Unicode, a braille system such as UEB could besaid to define additional context-dependent character properties for certaincharacters. For example, a hard hyphen in a true compound word is notcapital-like in an isolated uppercase word but it is capital-like in acapitalized passage. Here I'm using the term "capital-like" to indicate acharacter that does not affect the scope of the referenced indicator.

The other unique indicator in some braille systems is the numeric indicatorused to change the meaning of certain braille cells from representing smallletters to representing decimal digits. An unusual aspect of UEB is thatits numeric indicator has two functions: it not only indicates that certainletters are intended as digits, it also always sets Grade 1 mode.

As with the capitalization indicators, we find that other characters need tobe taken into account in determining the scope of the numeric indicator. Infact, in UEB, the same characters don't necessarily terminate both of itsfunctions. For example a comma is number-like in that it doesn't terminatethe scope of the indicator's number function whereas a colon is notnumber-like in that it does terminate the scope of the number function. Onthe other hand, only a hyphen or dash can terminate its Grade 1 modefunction. Remember that unless Grade 1 mode has been terminated,contractions cannot be used following the digits in an alphanumeric item.

Another problem in addition to separate rules affecting the termination ofthe two functions is that some of the rules are context-dependent. Forexample, the dot five numerical space doesn't terminate the scope of theindicator unless it is immediately followed by a digit.

Implementing the new features of UEB looks difficult to me and I wish youall the best.


SusanJ

P.S. I don't consider myself a UEB expert and although I've tried to becareful, I can't guarantee that what I've written here is entirely correct.You should certainly check the official rules prior to implementing anyaspects of UEB.

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Follow-Ups:
- [liblouis-liblouisxml] Re: Capital/Emphasis update
  - From: Joseph Lee
- [liblouis-liblouisxml] Re: Capital/Emphasis update
  - From: Bert Frees

[liblouis-liblouisxml] Re: Capital/Emphasis update

Other related posts: