[liblouis-liblouisxml] Unified English Braille table set: current state of UEB tables in Liblouis

  • From: "Joseph Lee" <joseph.lee22590@xxxxxxxxx>
  • To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Thu, 26 Jun 2014 07:05:22 -0700

Hi folks, mostly table and code maintainers and UEB readers:

I'd like to present the current state of the LibLouis table set for Unified
English Braille (UEB) along with some concwerns and suggestions for solving
these problems.

Currently, we have three sets of UEB tables: the old (current) UEBC table
set written by Tom Johnston, who passed away; newer UEB table set included
in master branch; and the rulebook-based rewrite of contracted UEB table,
developed in Bitbucket repo for Liblouis. Of these, it was proposed that we
switch to the newer table in master branch, with some sections coming from
rulebook-based table with table tests added.

As of 2014, the master UEB set (based on United States English braille code)
implements majority of the literary UEB standard, with some of the Unicode
symbols included. The Bitbucket table set also implements majority of the
rules, with the table content reorganized according to rules from the
rulebook. Both table set includes rules which are missing from the other
set, namely certain contraction rules are missing in the Bitbucket set and
other rules are missing from the master table set (it'll take a while to
list which rules are missing from which table).

However, what I'm more concerned about is the fact that the character
definitions are out of date, which may explain back translation issues
reported by Ken a few days ago. For example, in the master table set,
certain symbols are defined as mathematical characters (such as less-than,
plus, greater-than sign, etc.), which may pose an issue for back translating
strings with those characters included. Without remedying this issue, we may
see more back translation errors, which would defeat the intention of our
UEB implementation.

Another concern is the persistent notion that UEB requires computer braille.
This isn't the case (UEB does not require dedicated computer braille code).
This is another reason to take a look at en-ueb-chardefs file to remove any
references to computer braille symbols, which will take at least a month to
do (especially with testing involved, like I and Ken do with our respective
projects: Braille plus 18 for Ken, NVDA for me).

Thus based on these findings, added with the fact that UEB is mostly a
literary standard, I'm beginning to worry that our implementation of UEB
might not be stable, or at worse, incomplete (I might add that UEB can never
completely be implemented by any computerized braille translation program
because there are some rules which requires human intervention). Also talk
about adoption in one of the largest markets - United States in 2016 and
you'll see the magnitude of this problem.

But I believe we should not think about problems alone: there are possible
solutions, both via table and code modifications that may allow substantial
implementation of UEB that we could try. Here are some major issues to be
solved and implemented:

.        Unified Liblouis braille table set for Unified English Braille
code: by far, this is the critical hindrance to continuation of UEB
implementation. Based on work we've done, for ease of future extension and
for ease of debugging, I propose adopting the Bitbucket table set after
examining which rules can be ported from the current master set (the master
set contains some rules which are missing in the Bitbucket set, namely
working with contracted lower braille dots which are part of a word such as
"in").

.        Capital passage indicator: perhaps using the current method for
determining emphasis passage might be useful.

.        Grade 1 braille embedded in grade 2: there are UEB-specific signs
which allows embedding grade 1 braille within grade 2. This is used more by
transcribers than automated tools, but just in case a document asks for such
scenario (via XML or other markup), we should be prepared to handle such
cases.

.        Exceptions to contraction rules: there are words which cannot be
contracted due to various reasons ("dayan" is a good example). By far the
master table set implements this well. In order to solve this, a dictionary
of such words should be defined.

.        Rewriting major portions of chardefs: this is needed in order to
prevent further back translation problems and to make sure UeB table set
uses correct dots for punctuation, thereby freeing the tables from reliance
on computer braile derived symbols once and for all.

.        Organization of the tables according to major rules or sections:
this might be handy if we're debugging the table via checktable or for ease
of future extensions (in case UEB changes).

.        Testing by users and organizations: What may allow UEB to be
implemented well in this project would be collaboration with users and
organizations willing to test our UEB implementation and give us feedback.
There are at least four routes for testing: a firmware for Braille Plus 18,
and third-party snapshots for Orca, NVDA and Braille Blaster. However,
testing should not be limited to users and organizations: we need feedback
from transcribers and people who are actually drafting UEB standard (that
is, International Council on English Braille, or ICEB and its member
organizations).

.        Test data: a few days ago, Mesar mentioned that some tables need
more test data. UEB is no exception, and if people are willing to learn how
the test file works, we'd be able to come up with common and not so common
test cases to stress the UEB implementation to its limits (or beyond its
limits) so we can prove that our UEB table sets are stable.

Thanks.

Cheers,

Joseph

Other related posts: