[liblouis-liblouisxml] Re: Offset problems.

  • From: Bert Frees <bertfrees@xxxxxxxxx>
  • To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Tue, 5 Jan 2016 13:51:46 +0100

Hi Dave.

Thanks for these tests.

As you say the "minor" problem may be a matter of opinion. Arend relies on
this difference in behavior between inputPositions and outputPositions in
order to highlight capital signs: see
//www.freelists.org/post/liblouis-liblouisxml/Update-output-positions-during-multi-pass-forward-translation-only,10.
Another mode flag is a possibility, although I think I'd rather see an
agreement on the expected behavior.

The "major" problem is because of the rule "always st. 34-256". Liblouis
sees "st." as one (atomic) contraction. In other words, code behaves as
expected, but one might consider this a bug in the table.


2016-01-04 22:04 GMT+01:00 Dave Mielke <dave@xxxxxxxxx>:

Here are the results of a simple test using "en-us-g2.ctb". As you'll see,
there are problems with both forward and backward translation insofar as
the
offsets are concerned. I guess we can reduce the code size even a bit
more, as
about a hundred lines implement a couple of tests (this being one of
them). Now
we're down to about 500 line of Java and 114 lines of C.

I'm translating from text to braille with mode set to (dotsIO | ucBrl), and
back to text with mode set to (0). You'll need to use a font containing the
Unicode braille patterns, therefore, to properly read this log.

First, the results of both translations. They're correct.

Original Text: This is a test.
Braille Translation: ⠠⠹⠀⠊⠎⠀⠁⠀⠞⠑⠌⠲
Text Back Translation: This is a test.

Here's the input to output mapping of the forward translation. It contains
one
major problem and one minor one. The major one is that . (the period)
points to
⠌ (the st contraction). The minor one is that T (the capital T) points to
⠹ (in
other words, to the first braille letter) instead of to ⠠ (the capital
sign).
This minor one may be a matter of opinion, so, perhaps, an additional mode
flag
could be provided.

in->out: 0->1 T->⠹
in->out: 1->1 h->⠹
in->out: 2->1 i->⠹
in->out: 3->1 s->⠹
in->out: 4->2 ->⠀
in->out: 5->3 i->⠊
in->out: 6->4 s->⠎
in->out: 7->5 ->⠀
in->out: 8->6 a->⠁
in->out: 9->7 ->⠀
in->out: 10->8 t->⠞
in->out: 11->9 e->⠑
in->out: 12->10 s->⠌
in->out: 13->10 t->⠌
in->out: 14->10 .->⠌

Here's the output to input mapping of the forward translation. In it,
you'll
see that ⠲ (the period) points to s (the final s) instead of to the period
it
actually represents.

out->in: 0->0 ⠠->T
out->in: 1->0 ⠹->T
out->in: 2->4 ⠀->
out->in: 3->5 ⠊->i
out->in: 4->6 ⠎->s
out->in: 5->7 ⠀->
out->in: 6->8 ⠁->a
out->in: 7->9 ⠀->
out->in: 8->10 ⠞->t
out->in: 9->11 ⠑->e
out->in: 10->12 ⠌->s
out->in: 11->12 ⠲->s

Here's the input to output mapping of the backward translation. In it,
you'll
see that ⠲ (the period) points to t (the final t) rather than to the period
that it actually represents.

in->out: 0->0 ⠠->T
in->out: 1->0 ⠹->T
in->out: 2->4 ⠀->
in->out: 3->5 ⠊->i
in->out: 4->6 ⠎->s
in->out: 5->7 ⠀->
in->out: 6->8 ⠁->a
in->out: 7->9 ⠀->
in->out: 8->10 ⠞->t
in->out: 9->11 ⠑->e
in->out: 10->12 ⠌->s
in->out: 11->13 ⠲->t

Here's the output to input mapping of the backward translation. In it,
you'll
see that t (the final t of test) points to ⠲ (the period). It also
contains the
minor problem that T (the capital T) points to ⠹ (the first braille letter)
rather than to ⠠ (the capital sign).

out->in: 0->1 T->⠹
out->in: 1->1 h->⠹
out->in: 2->1 i->⠹
out->in: 3->1 s->⠹
out->in: 4->2 ->⠀
out->in: 5->3 i->⠊
out->in: 6->4 s->⠎
out->in: 7->5 ->⠀
out->in: 8->6 a->⠁
out->in: 9->7 ->⠀
out->in: 10->8 t->⠞
out->in: 11->9 e->⠑
out->in: 12->10 s->⠌
out->in: 13->11 t->⠲
out->in: 14->11 .->⠲

--
Dave Mielke | 2213 Fox Crescent | The Bible is the very Word of
God.
Phone: 1-613-726-0014 | Ottawa, Ontario | http://Mielke.cc/bible/
EMail: Dave@xxxxxxxxx | Canada K2A 1H7 | http://FamilyRadio.org/
For a description of the software, to download it and links to
project pages go to http://liblouis.org

Other related posts: