[liblouis-liblouisxml] Re: Capital/Emphasis update

From: Bert Frees <bertfrees@xxxxxxxxx>
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Date: Tue, 24 Feb 2015 11:57:34 +0100
Thanks Susan for the useful background!

Some of the issues you mention can already be addressed by liblouis, some not
yet.

- "How a passage is defined can depend on context": This is not addressed yet,
  and will be very difficult to implement.

- "Termination indicators are placed before the last item in the passage in some
  braille codes, and after the last item in other braille code": This is
  addressed by liblouis (lastword*before vs. lastword*after).

- "Context-dependent character properties (e.g. a hard hyphen in a true compound
  word is not capital-like in an isolated uppercase word but it is capital-like
  in a capitalized passage)": Not addressed in liblouis.

- "A comma is number-like in that it doesn't terminate the scope of the
  indicator's number function whereas a colon is not number-like in that it does
  terminate the scope of the number function": I think we have decpoint for
  this.

- "Only a hyphen or dash can terminate the number indicator's Grade 1 mode
  function": Not addressed.

- "The dot five numerical space doesn't terminate the scope of the number
  indicator unless it is immediately followed by a digit": Not addressed.

Let this be a plea for further simplifying braille codes. The context-dependent
rules are a real problem. You seem to understand the intent of these rules, but
I, as an outsider, still fail to see it. Especially the cases where the context
is to be determined by a human being go over my head. I don't know of any such
rules in print, and what is so special about braille that it needs them?


Regards,
Bert


Susan Jolly writes:

> I've been aware of the complexities of the UEB capitalization approach as 
> well as the related complexities of identifying any passage which requires 
> some sort of braille markup for quite some time.  I'm not at all suprised 
> that addressing all this requires a lot of work and I've been impressed by 
> the level of the recent discussion.
>
> In my experience as a former code developer I typically went back and forth 
> between top-level or conceptual issues and bottom-level or implementation 
> issues numerous times before I fully understood what needed to be done and 
> how to accomplish it.  Since there's been quite a bit of bottom-level 
> discussion on the subject concern, I thought it might be helpful to review 
> some of the conceptual issues.  I don't mean to imply that you haven't 
> already thought of these issues but wanted to explain why I think it is 
> important to keep the big picture, including the rules for the numeric 
> indicator, in mind when working with UEB.
>
> A standard transcribing problem is the use of special typeforms such as 
> italics. Here one can to some extent rely on the XML markup in the print 
> source file. Nonethless there are still braille issues that need 
> consideration.
>
> One issue is how a passage is defined.  For example in EBAE an italics 
> sequence consists of more that one passage if its sub-sequences are 
> italicized for different reasons.  Thus each title in a comma-separated list 
> of italicized book titles would have its own italics indicator(s). The 
> intent of this rule is to make it easier for the reader.
>
> A related issue is whether a needed termination indicator is placed before 
> or after the last item in the passage. As a sighted person I can look ahead 
> to see where either a rendered or marked-up italicized passage ends but I 
> can understand why a braille reader might find it more informative to have 
> the last word identified before rather than after it.
>
> In addition to typeform indicators there are markup issues unique to 
> braille.
>
> For example, capital letters require markup in braille but not in print. One 
> could add XML tags to mark the presence of capital letters in the source. 
> (I don't know much about XSL but it might be a useful way to add such tags.) 
> Of course, some thought needs to be given as to how to mark a capitalized 
> passage as opposed to a capitalized word and to what extent the details of 
> the markup should depend on the specifics of the targetted braille system.
>
> Also, as has been pointed out on the list, sometimes characters other than 
> capital letters may be considered as capitals since they don't "break" the 
> scope of the passage capitalization indicators. How should this be 
> addressed?
>
> Unicode associates what it calls "character properties" to each character. 
> Examples of properties include Lu for uppercase letters and Nd for decimal 
> digits.  I believe that modern Unicode-aware programming languages can 
> provide these character properties.
>
> Using the above idea from Unicode, a braille system such as UEB could be 
> said to define additional context-dependent character properties for certain 
> characters. For example, a hard hyphen in a true compound word is not 
> capital-like in an isolated uppercase word but it is capital-like in a 
> capitalized passage. Here I'm using the term "capital-like" to indicate a 
> character that does not affect the scope of the referenced indicator.
>
> The other unique indicator in some braille systems is the numeric indicator 
> used to change the meaning of certain braille cells from representing small 
> letters to representing decimal digits.  An unusual aspect of UEB is that 
> its numeric indicator has two functions: it not only indicates that certain 
> letters are intended as digits, it also always sets Grade 1 mode.
>
> As with the capitalization indicators, we find that other characters need to 
> be taken into account in determining the scope of the numeric indicator. In 
> fact, in UEB, the same characters don't necessarily terminate both of its 
> functions. For example a comma is number-like in that it doesn't terminate 
> the scope of the indicator's number function whereas a colon is not 
> number-like in that it does terminate the scope of the number function. On 
> the other hand, only a hyphen or dash can terminate its Grade 1 mode 
> function. Remember that unless Grade 1 mode has been terminated, 
> contractions cannot be used following the digits in an alphanumeric item.
>
> Another problem in addition to separate rules affecting the termination of 
> the two functions is that some of the rules are context-dependent. For 
> example, the dot five numerical space doesn't terminate the scope of the 
> indicator unless it is immediately followed by a digit.
>
> Implementing the new features of UEB looks difficult to me and I wish you 
> all the best.
>
> SusanJ
>
> P.S. I don't consider myself a UEB expert and although I've tried to be 
> careful, I can't guarantee that what I've written here is entirely correct. 
> You should certainly check the official rules prior to implementing any 
> aspects of UEB. 
>
> For a description of the software, to download it and links to
> project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com
Follow-Ups:
- [liblouis-liblouisxml] Re: Capital/Emphasis update
  - From: Paul Wood
References:
- [liblouis-liblouisxml] Re: Capital/Emphasis update
  - From: Susan Jolly
[liblouis-liblouisxml] Re: Capital/Emphasis update

Other related posts: