Hi all,
As Christian already mentioned, we have been working on a new way to do
Braille hyphenation here at SBS. We now hyphenate the text in a step
prior to the translation, and afterwards we determine the hyphen
positions in the output using the inputPos array.
While testing this new functionality we hit a nasty problem. It turned
out sometimes the inputPos array was wrong. We investigated the
problem,
found a few bugs in liblouis, and (at least we think) we managed to fix
them.
For the technical people, I'll briefly outline the problem:
Currently, there are two array variables (srcMapping and
inputPositions)
that basically do the same thing, namely keeping track of the "input
positions", i.e. for each position in the output, they give you the
corresponding position in the input. The problem is that different
parts
of the code are working on a different array, which is a bit confusing
and which causes the arrays to get out of sync.
This is what happens in the code:
1. srcMapping[] is initialized to [0,1,2,3,4,...]
2. in pass0 (correct opcode), srcMapping[] is updated
3. in pass1, srcMapping[] is updated whenever a context rule applies,
and inputPositions[] & outputPositions[] are computed based on
srcMapping[] whenever a regular translation rule applies.
4. in pass2, srcMapping[] is updated
5. in pass3, srcMapping[] is updated
6. in pass4, srcMapping[] is updated
In addition, there are a few other bugs:
- The updating of srcMapping should be based on the previous srcMapping
(the state at the end of the previous pass), but this is not done
consistently throughout the code.
- The updating of srcMapping is not done correctly when dest > src
(dest
is the current outputPosition and src is the current inputPosition).
The
reason is that the new srcMapping and the previous srcMapping use the
same memory block, which means that some data might be overwritten
before being copied. (cf. a similar problem can occur in memcpy(),
that's why it's recommended to use memmove().)
This is the solution we came up with:
We created a new variable named prevSrcMapping. srcMapping is now
always
updated based on prevSrcMapping, and before each new pass we copy
srcMapping to prevSrcMapping. Also in pass1, we update srcMapping and
not inputPositions. Only at the very end of the translation, srcMapping
is copied to inputPositions.
We created a pretty decent testcase so we're quite confident that
inputPos is correct now. The less good news is that we're not so sure
about outputPos and cursorPos. We didn't try to fix them yet (we don't
even know how to test cursorPos), but it's possible they are somehow
affected by the other changes we did. At best, it's as bad as it was
before. Also, we didn't do anything about backtranslation yet.
For personal use, we created a github page[1] to keep track of the
changes. You can review them from there. We can also send a series of
patches to the mailing list if somebody has trouble reading it.
Regards,
Bert
Footnotes
-------------
[1] https://github.com/sbsdev/liblouis/commits/master