[liblouis-liblouisxml] Re: Capital/Emphasis update

  • From: "Joseph Lee" <joseph.lee22590@xxxxxxxxx>
  • To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Mon, 23 Feb 2015 18:30:47 -0800

Hi,
Not related exactly to this thread, but might be helpful in the long term:
I sent an email to Jennifer Dunnam, chair of Braille Authority of North
America (BANA) who gave a presentation on UEB on Tek Talk today. I asked her
to come onboard so we can hear from actual UEB implementers in hopes of a
collaboration between UEB standards implementers and software development
community (including us) to improve our support for UEB. As this thread
points out, our work on UEB is far from over, and I believe now it's the
time to hear from transcribers and experts to see what we can do about it in
terms of implementing UEB ins software.
Cheers,
Joseph

-----Original Message-----
From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Susan Jolly
Sent: Monday, February 23, 2015 5:56 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: Capital/Emphasis update

I've been aware of the complexities of the UEB capitalization approach as
well as the related complexities of identifying any passage which requires
some sort of braille markup for quite some time.  I'm not at all suprised
that addressing all this requires a lot of work and I've been impressed by
the level of the recent discussion.

In my experience as a former code developer I typically went back and forth
between top-level or conceptual issues and bottom-level or implementation
issues numerous times before I fully understood what needed to be done and
how to accomplish it.  Since there's been quite a bit of bottom-level
discussion on the subject concern, I thought it might be helpful to review
some of the conceptual issues.  I don't mean to imply that you haven't
already thought of these issues but wanted to explain why I think it is
important to keep the big picture, including the rules for the numeric
indicator, in mind when working with UEB.

A standard transcribing problem is the use of special typeforms such as
italics. Here one can to some extent rely on the XML markup in the print
source file. Nonethless there are still braille issues that need
consideration.

One issue is how a passage is defined.  For example in EBAE an italics
sequence consists of more that one passage if its sub-sequences are
italicized for different reasons.  Thus each title in a comma-separated list
of italicized book titles would have its own italics indicator(s). The
intent of this rule is to make it easier for the reader.

A related issue is whether a needed termination indicator is placed before
or after the last item in the passage. As a sighted person I can look ahead
to see where either a rendered or marked-up italicized passage ends but I
can understand why a braille reader might find it more informative to have
the last word identified before rather than after it.

In addition to typeform indicators there are markup issues unique to
braille.

For example, capital letters require markup in braille but not in print. One
could add XML tags to mark the presence of capital letters in the source. 
(I don't know much about XSL but it might be a useful way to add such tags.)
Of course, some thought needs to be given as to how to mark a capitalized
passage as opposed to a capitalized word and to what extent the details of
the markup should depend on the specifics of the targetted braille system.

Also, as has been pointed out on the list, sometimes characters other than
capital letters may be considered as capitals since they don't "break" the
scope of the passage capitalization indicators. How should this be
addressed?

Unicode associates what it calls "character properties" to each character. 
Examples of properties include Lu for uppercase letters and Nd for decimal
digits.  I believe that modern Unicode-aware programming languages can
provide these character properties.

Using the above idea from Unicode, a braille system such as UEB could be
said to define additional context-dependent character properties for certain
characters. For example, a hard hyphen in a true compound word is not
capital-like in an isolated uppercase word but it is capital-like in a
capitalized passage. Here I'm using the term "capital-like" to indicate a
character that does not affect the scope of the referenced indicator.

The other unique indicator in some braille systems is the numeric indicator
used to change the meaning of certain braille cells from representing small
letters to representing decimal digits.  An unusual aspect of UEB is that
its numeric indicator has two functions: it not only indicates that certain
letters are intended as digits, it also always sets Grade 1 mode.

As with the capitalization indicators, we find that other characters need to
be taken into account in determining the scope of the numeric indicator. In
fact, in UEB, the same characters don't necessarily terminate both of its
functions. For example a comma is number-like in that it doesn't terminate
the scope of the indicator's number function whereas a colon is not
number-like in that it does terminate the scope of the number function. On
the other hand, only a hyphen or dash can terminate its Grade 1 mode
function. Remember that unless Grade 1 mode has been terminated,
contractions cannot be used following the digits in an alphanumeric item.

Another problem in addition to separate rules affecting the termination of
the two functions is that some of the rules are context-dependent. For
example, the dot five numerical space doesn't terminate the scope of the
indicator unless it is immediately followed by a digit.

Implementing the new features of UEB looks difficult to me and I wish you
all the best.

SusanJ

P.S. I don't consider myself a UEB expert and although I've tried to be
careful, I can't guarantee that what I've written here is entirely correct. 
You should certainly check the official rules prior to implementing any
aspects of UEB. 

For a description of the software, to download it and links to project pages
go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: