[liblouis-liblouisxml] Re: Capital/Emphasis update

  • From: "Susan Jolly" <easjolly@xxxxxxxxxxxxx>
  • To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Mon, 23 Feb 2015 18:55:39 -0700

I've been aware of the complexities of the UEB capitalization approach as well as the related complexities of identifying any passage which requires some sort of braille markup for quite some time. I'm not at all suprised that addressing all this requires a lot of work and I've been impressed by the level of the recent discussion.


In my experience as a former code developer I typically went back and forth between top-level or conceptual issues and bottom-level or implementation issues numerous times before I fully understood what needed to be done and how to accomplish it. Since there's been quite a bit of bottom-level discussion on the subject concern, I thought it might be helpful to review some of the conceptual issues. I don't mean to imply that you haven't already thought of these issues but wanted to explain why I think it is important to keep the big picture, including the rules for the numeric indicator, in mind when working with UEB.

A standard transcribing problem is the use of special typeforms such as italics. Here one can to some extent rely on the XML markup in the print source file. Nonethless there are still braille issues that need consideration.

One issue is how a passage is defined. For example in EBAE an italics sequence consists of more that one passage if its sub-sequences are italicized for different reasons. Thus each title in a comma-separated list of italicized book titles would have its own italics indicator(s). The intent of this rule is to make it easier for the reader.

A related issue is whether a needed termination indicator is placed before or after the last item in the passage. As a sighted person I can look ahead to see where either a rendered or marked-up italicized passage ends but I can understand why a braille reader might find it more informative to have the last word identified before rather than after it.

In addition to typeform indicators there are markup issues unique to braille.

For example, capital letters require markup in braille but not in print. One could add XML tags to mark the presence of capital letters in the source. (I don't know much about XSL but it might be a useful way to add such tags.) Of course, some thought needs to be given as to how to mark a capitalized passage as opposed to a capitalized word and to what extent the details of the markup should depend on the specifics of the targetted braille system.

Also, as has been pointed out on the list, sometimes characters other than capital letters may be considered as capitals since they don't "break" the scope of the passage capitalization indicators. How should this be addressed?

Unicode associates what it calls "character properties" to each character. Examples of properties include Lu for uppercase letters and Nd for decimal digits. I believe that modern Unicode-aware programming languages can provide these character properties.

Using the above idea from Unicode, a braille system such as UEB could be said to define additional context-dependent character properties for certain characters. For example, a hard hyphen in a true compound word is not capital-like in an isolated uppercase word but it is capital-like in a capitalized passage. Here I'm using the term "capital-like" to indicate a character that does not affect the scope of the referenced indicator.

The other unique indicator in some braille systems is the numeric indicator used to change the meaning of certain braille cells from representing small letters to representing decimal digits. An unusual aspect of UEB is that its numeric indicator has two functions: it not only indicates that certain letters are intended as digits, it also always sets Grade 1 mode.

As with the capitalization indicators, we find that other characters need to be taken into account in determining the scope of the numeric indicator. In fact, in UEB, the same characters don't necessarily terminate both of its functions. For example a comma is number-like in that it doesn't terminate the scope of the indicator's number function whereas a colon is not number-like in that it does terminate the scope of the number function. On the other hand, only a hyphen or dash can terminate its Grade 1 mode function. Remember that unless Grade 1 mode has been terminated, contractions cannot be used following the digits in an alphanumeric item.

Another problem in addition to separate rules affecting the termination of the two functions is that some of the rules are context-dependent. For example, the dot five numerical space doesn't terminate the scope of the indicator unless it is immediately followed by a digit.

Implementing the new features of UEB looks difficult to me and I wish you all the best.

SusanJ

P.S. I don't consider myself a UEB expert and although I've tried to be careful, I can't guarantee that what I've written here is entirely correct. You should certainly check the official rules prior to implementing any aspects of UEB.
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: