[liblouis-liblouisxml] UEB patches: Summary of issues with examples

  • From: Davy Kager <DavyKager@xxxxxxxxxx>
  • To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Thu, 9 Jul 2015 12:02:02 +0000

Hi,

I added a few small test tables to illustrate some issues I'm having with the
APH UEB patches. For reference I will list them here. All the referenced tables
are in this branch:
https://github.com/snaekobbi/liblouis/tree/dkager_dutch_with_patches
I tried to keep them as generic as possible. To my knowledge this list is
accurate, but any corrections are greatly appreciated.

In the following, 'UEB patches' refers to branch feature/ueb_update:
https://github.com/liblouis/liblouis/tree/feature/ueb_update

1. Context rule in nl-g0.utb is ignored

Type: Regression
Table: tests/tables/context-ignored.utb
Problem: The context rule in this table appears to be ignored.
Use case: In Dutch, the word 'één' is written in caps as 'Eén'. However in
braille it should read 'Één'. There is a context rule to do this, which works
in liblouis v2.6.3.

More:
I'm wondering if the broader problem might be that uppercase letters don't seem
to be marked as such, judging by lou_trace:
HELLO
~hello
1. lowercase h 125
2. lowercase e 15
3. lowercase l 123
4. lowercase l 123
5. lowercase o 135
This could of course also be because lou_trace doesn't know about the new
patches, I haven't looked into that.

2. endcaps is omitted under certain conditions

Type: Bug
Table: tests/tables/old-caps-opcodes-1.utb
Problem: In certain words containing exactly two caps followed by a lowercase
letter, the endcap sign is not inserted. This is a problem in v2.6.3 and not of
the UEB patches.
Use case: In Dutch, endcaps is used for mixed-case words such as 'HELlo'.

3. begcaps and endcaps don't work

Type: Regression
Table: tests/tables/old-caps-opcodes-2.utb
Problem: With the UEB patches, begcaps and endcaps appear to have no effect.
See also: (2)

4. Numbers, punctuation and other non-letters are seen as capitals

Type: WIP
Table: tests/tables/non-letters-as-caps.utb
Problem: In Dutch, phrases of 4 or more words in uppercase letters get special
treatment. It is reasonable to say that 'words' consisting of only numbers or
other non-letter symbols do not end such phrases. However, the current UEB
patches are a little too liberal in determining what should be marked as
'uppercase'.
Use case: The advantage of treating non-letters as 'uppercase' is that a phrase
like 'HELLO WORLD 123 CHECK' does not end up containing 3 capital signs, one
for each capitalized 'real' word.

More:
The special treatment as described above inserts fewer capsigns, especially for
longer phrases. However, for phrases like 'CHECK 123 123 123' it looks odd to
get a capital sign in front of the last '123'. For Dutch, liblouis should
detect this and only mark the word 'CHECK' as capitalized. That's why I marked
this issue as work-in-progress.

These are all the issues for now. I will update this list as I find more
problems. Or maybe Christian will find them first with the new harness runner.

Davy

DISCLAIMER:
De informatie verzonden met dit e-mail bericht is uitsluitend bestemd voor de
geadresseerde. Indien u niet de beoogde geadresseerde bent, verzoeken wij u
vriendelijk dit aan de afzender te melden (of via:
info@xxxxxxxxxx<mailto:info@xxxxxxxxxx>) en het origineel en eventuele kopieën
te verwijderen.

The information sent in this e-mail is solely intended for the individual or
company to whom it is addressed. If you received this message in error, please
notify the sender immediately (or mail to
info@xxxxxxxxxx<mailto:info@xxxxxxxxxx>) and delete the original message and
possible copies.

Other related posts: