[liblouis-liblouisxml] Re: inputPos fixes

Awesome, thanks for looking at it. If John's okay with it, we'll merge the changes with the main branch next week.

Regards,
/Bert


On 05/10/2012 03:09 AM, Timothy Lee wrote:
Dear Bert,

As per your request, I've gone through the changes to srcMapping in your git repo. They appear to be fine.

Regards,
Timothy Lee


On 05/02/2012 06:29 PM, John J. Boyer wrote:
BrailleBlaster also uses the inputPos array, so I'm glad of the fixes.
I'm guessing hat the cursor prosition won't be affected much, if at all,
but that needs testing. I think there are already some tests for this.

John

On Wed, May 02, 2012 at 11:35:56AM +0200, Bert Frees wrote:
Hi all,

As Christian already mentioned, we have been working on a new way to do
Braille hyphenation here at SBS. We now hyphenate the text in a step
prior to the translation, and afterwards we determine the hyphen
positions in the output using the inputPos array.

While testing this new functionality we hit a nasty problem. It turned
out sometimes the inputPos array was wrong. We investigated the problem,
found a few bugs in liblouis, and (at least we think) we managed to fix
them.

For the technical people, I'll briefly outline the problem:

Currently, there are two array variables (srcMapping and inputPositions)
that basically do the same thing, namely keeping track of the "input
positions", i.e. for each position in the output, they give you the
corresponding position in the input. The problem is that different parts
of the code are working on a different array, which is a bit confusing
and which causes the arrays to get out of sync.

This is what happens in the code:

1. srcMapping[] is initialized to [0,1,2,3,4,...]
2. in pass0 (correct opcode), srcMapping[] is updated
3. in pass1, srcMapping[] is updated whenever a context rule applies,
and inputPositions[] & outputPositions[] are computed based on
srcMapping[] whenever a regular translation rule applies.
4. in pass2, srcMapping[] is updated
5. in pass3, srcMapping[] is updated
6. in pass4, srcMapping[] is updated

In addition, there are a few other bugs:

- The updating of srcMapping should be based on the previous srcMapping
(the state at the end of the previous pass), but this is not done
consistently throughout the code.

- The updating of srcMapping is not done correctly when dest > src (dest is the current outputPosition and src is the current inputPosition). The
reason is that the new srcMapping and the previous srcMapping use the
same memory block, which means that some data might be overwritten
before being copied. (cf. a similar problem can occur in memcpy(),
that's why it's recommended to use memmove().)

This is the solution we came up with:

We created a new variable named prevSrcMapping. srcMapping is now always
updated based on prevSrcMapping, and before each new pass we copy
srcMapping to prevSrcMapping. Also in pass1, we update srcMapping and
not inputPositions. Only at the very end of the translation, srcMapping
is copied to inputPositions.

We created a pretty decent testcase so we're quite confident that
inputPos is correct now. The less good news is that we're not so sure
about outputPos and cursorPos. We didn't try to fix them yet (we don't
even know how to test cursorPos), but it's possible they are somehow
affected by the other changes we did. At best, it's as bad as it was
before. Also, we didn't do anything about backtranslation yet.

For personal use, we created a github page[1] to keep track of the
changes. You can review them from there. We can also send a series of
patches to the mailing list if somebody has trouble reading it.

Regards,
Bert

Footnotes
-------------
[1] https://github.com/sbsdev/liblouis/commits/master



For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: