[liblouis-liblouisxml] Re: Offset problems.

  • From: Bert Frees <bertfrees@xxxxxxxxx>
  • To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Wed, 6 Jan 2016 12:22:00 +0100

What you want to rely on can be achieved by using inputPositions
instead of outputPositions. At least that is what I do and it
seems to work. By the way you should also know that actually
outputPositions are currently broken, see
e.g. https://github.com/liblouis/liblouis/pull/133.

I don't think that having a slightly different behavior for
inputPositions and outputPositions is necessarily wrong. In fact,
if we're expecting that for each input offset i:

inputPositions[outputPositions[i]] == i

and for each output offset j:

outputPositions[inputPositions[j]] == j

then one of the arrays is completely redundant.

The different behavior seems to be useful for Arend and I can see
how. I'm just not sure whether the way it is accomplished is the
right way. Maybe a special mode flag, like Dave suggests, is
better. The downside is however that Arend requires two
translation calls to get all the information he needs.

There are some other possible improvements that can be made to
the way we present mapping information. More specifically, the
current way is limited because it doesn't allow for detection
of "removals" (text that has no corresponding braille)
and "insertions" (braille that has no corresponding
text). (Insertions are less likely to happen than removals.)

This is not a major problem but has some implications related to
computing hyphen positions in the output based on hyphen
positions in the input.

To overcome the problem the mapping arrays would e.g. need to be
able to carry information like "no corresponding position". This
would also justify having two arrays because one of them would not
be redundant anymore.



2016-01-05 19:04 GMT+01:00 Dave Mielke <dave@xxxxxxxxx>:

[quoted lines by Bert Frees on 2016/01/05 at 13:51 +0100]

As you say the "minor" problem may be a matter of opinion. Arend relies on
this difference in behavior between inputPositions and outputPositions in
order to highlight capital signs: see

//www.freelists.org/post/liblouis-liblouisxml/Update-output-positions-during-multi-pass-forward-translation-only,10
.
Another mode flag is a possibility, although I think I'd rather see an
agreement on the expected behavior.

I understand. What I'm hoping I'll be able to rely on is that offset[n] ->
offset[n+1] covers the exact mapping of that segment. This isn't true if
the
mapping points beyond any prefix symbols.

The "major" problem is because of the rule "always st. 34-256". Liblouis
sees "st." as one (atomic) contraction. In other words, code behaves as
expected, but one might consider this a bug in the table.

Thanks, I understand. This is unfortunate. I'm needing to be able to
create an
accurately speakable span for each bit of braille. As I'm sure you may
know,
the slightest error in what should be spoken can sound absolutely horrible.

--
Dave Mielke | 2213 Fox Crescent | The Bible is the very Word of
God.
Phone: 1-613-726-0014 | Ottawa, Ontario | http://Mielke.cc/bible/
EMail: Dave@xxxxxxxxx | Canada K2A 1H7 | http://FamilyRadio.org/
For a description of the software, to download it and links to
project pages go to http://liblouis.org

Other related posts: