[liblouis-liblouisxml] Re: Update output positions during multi pass (forward translation only)

  • From: Bert Frees <bertfrees@xxxxxxxxx>
  • To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Fri, 30 Oct 2015 20:32:41 +0100

The reason that I also wanted output positions is that I want to link
text characters to the corresponding braille characters and not to the
preceding presigns. In addition I can mark the presigns. For this I use
both the input and output positions

We should probably document this behavior somewhere, because it is
important to understand how the mapping works. Are you saying that the
following holds when you translate "100"?

input: "100"
outputPos: [1,2,3]
output: "#ajj"
inputPos: [0,0,1,2]

[...] in for_updatePositions. Here srcMapping is not updated if
inputPositions is NULL, contrary to your statement that srcMapping is
always used. This affects also outputPositions indirectly.

Right. I was already afraid some bug like that was still there. That should
indeed be fixed.



2015-10-30 16:16 GMT+01:00 Arend Arends <mada73bg@xxxxxxxxxx>:

Hi Bert,

The test will come later.

I checked input positions in my program and indeed the input positions are
corrected during multi passes. The reason that I also wanted output
positions is that I want to link text characters to the corresponding
braille characters and not to the preceding presigns. In addition I can
mark the presigns. For this I use both the input and output positions, but
in the process I forgot that more or less.

Now in the LibLouis code it seems that similar to having a srcMapping
array and a prevSrcMapping array one should also have a destMapping and
prevDestMapping array to avoid the same type of problems that you had
before.
Further the pointer inputPositions should be removed, because it has
little use outside trace_translate, where de original function parameter
outputPos is available.
The only place where inputPositions is used outside trace_translate is in
for_updatePositions. Here srcMapping is not updated if inputPositions is
NULL, contrary to your statement that srcMapping is always used. This
affects also outputPositions indirectly.

Arend

-----Oorspronkelijk bericht----- From: Bert Frees
Sent: Friday, October 30, 2015 3:31 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: Update output positions during multi
pass (forward translation only)

OK thanks, appreciated! The test is most important for me, implementation
details are just details.

The test could take input from a file but then we're pretty close to what
the
JSON and YAML test frameworks can already do. (@Christian: or am I wrong?
Is it
only cursorPos that is supported maybe?)

Again, just ask if I can do anything to help with the test.

Thanks for your time and energy.


Arend Arends writes:

Hi Bert,

I will check the input positions again. If that works, it makes a lot of
difference. I know input and output are probably correct after applying
"correct" and "context" rules.
I gradually learn more about how LibLouis works, it just takes a lot of
time
and energy.

In the meantime I found input.c in the tests map of LibLouis. I will see
whether I can expand that test, but again, it will take some time. Maybe
the
particular test program should be replaced by a version that takes input
from a file.

Arend

-----Oorspronkelijk bericht----- From: Bert Frees
Sent: Friday, October 30, 2015 2:27 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: Update output positions during multi
pass (forward translation only)

Hi Arend,
Answers inline:

Arend Arends writes:

Hi Bert,

Thank you for your review. It is certainly good to keep looking at a
better
way to make the changes. But I was a bit frustrated that long time ago?
the
pass options were added without taking this problem into account. Or
perhaps
the input and output positions were added later. I don’t know.


Yes, possibly. I don't go back that far myself so I don't know when it was
added.

I tried to make the changes where I could understand the effects of the
code. I also tried to make changes in passDoAction, but this did not
succeed
very well, so I finally made the changes in one place (passTranslate).

I noticed that in for_updatePositions the input and output positions are
updated together, but it is not per se that the output positions are
derived
from the input positions. They both use the arrays srcMapping and
prevSrcMapping. In the code and according to the documentation it is
possible
to have either inputPositions or outputPositions being a NULL pointer.


The srcMapping array is also used for some other things internally, such
as
translating pre-hyphenated text (lou_translatePrehyphenated) and updating
the
typeform array after pass0 (makeCorrections).

Therefore srcMapping and prevSrcMapping are always created/updated,
regardless
of whether inputPositions and outputPositions are NULL. At the end of the
lou_translate call, srcMapping is copied to inputPositions if it is not
NULL.

Making the changes in passDoAction would probably require to add some code
to
the lines where the array srcMapping is altered. There are several lines
where
this occurs and because passDoAction is also called for the normal pass
where
input and output positions are alreay handled, one would also have to
restrict
the changes to the higher passes. I tried to do that at first, but I was
not
very successful there to make the right additions.


I don't think that is true. The input positions are also updated in
passDoAction
without restrictions and this works fine. In fact I'm pretty sure it that
it
is
required and that otherwise the positions wouldn't be handled at all for
correct
(pass0) and context (pass1) rules. The best way to find out is to do the
test. I
think my inpos.c test shows that the input positions are correct also when
correct and context rules are present.

Apart from having a bit of trouble to oversee the code right away, I also
did
not have a very good debugging method available. Normally in C++ I use
the
Visual Studio browser to inspect all references of a certain function or
variable. With C code I could not do that. Also normally I step through
the
code during debugging. I did not yet succeed in doing that for LibLouis,
although I think it is possible if I spend more time on this. Currently
I
used the logging facilities to trace the steps in the code. That is a
tedious
process.

Maybe I will have a better look in the future when I have some time.

With respect to your last request of providing some tests that the code
was
broken and is fixed now:

I checked your commit in 2012. I did not see it before because Github
showed
only changes of 2013 and later. The older changes are in a separate part
that
can be reached by pressing the button “older” at the bottom. I missed
this
before.

I also saw your artical on freelists, which is quite useful to understand
something about the history. It is funny to see that you focused on the
input
positions and were unsure about output positions and cursor position at
the
time. In the past I did some tests to see whether using the input
positions
instead of the output positions would be a better way. After all, output
positions can to a certain extent be derived from the input positions
after
the translation is done. I did not see much difference, so I doubt at the
moment whether the input positions are updated properly during multi
pass. This would in fact mean that I have to add changes to update the
srcMapping array correctly. That does not make it easier.


I'm pretty sure input positions are working correctly. I've been using
that
feature successfully for a while.

In your links I could not find the file input.c, so I still don’t know how
to
make such a test. I do not have experience with using or even compiling
the
normal tools, so I do my testing only by calling the LibLouis functions
from
my own (Windows) program.


In understand, but it's important to have tests as close as possible to
the
source. You can of course do more extensive testing in your own software
if
you
find that more convenient (I do that too), but we should at least have a
few
minimal tests in liblouis. If only for the purpose of documentation, and
to
convince me that the changes do something.

I can help you create the tests. You could for example provide some test
data
and I can turn that into a C test. Or you could write the C test and I can
make
sure it compiles. The YAML test framework also supports checking input and
output positions if I'm not mistaken, so that's another option.

The inpus.c file was not any of the links. It's in the source code
repository in
the "tests" folder.

Arend

From: Bert Frees
Sent: Friday, October 30, 2015 11:55 AM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: Update output positions during multi
pass (forward translation only)

Hi Arend, thanks very much for your contribution.


I've been studying the code a bit, and it looks like your changes do what
they
need to do. However I'm a little concerned about how and where the
updating is
done.


The logic for handling input and output positions is a bit scattered all
over
the place. For the first pass, most of the updating happens in
for_updatePositions and some of it also happens in other places. But
input
and
output positions are always updated at the same time, and almost always
the
output positions are derived from the input positions.

For the other passes, most of the updating happens in passDoAction.
However an
important difference with the first pass is that the output positions are
completely ignored, as you had already noticed. This is probably the main
reason why outputPositions have been unreliable (there may be other bugs
left
though).


Your change has probably fixed it, but I would rather see the updating of
input and output positions happen at the same time, if possible. That is:
in
passDoAction (which is called from translatePass). This should make the
code
more readable. And you may also find some other bugs you didn't think of
yet. One thing I know is that this would also fix the correct opcode,
because
passDoAction is called from makeCorrections as well.

Also the way you update the output positions looks quite different then
how
it's done in other places, where the output positions are derived from
the
input positions. Maybe for code uniformity it would be better to use the
same
technique everywhere? Just an idea.


One last request is to have some tests that show the code was broken
before
and is fixed now. See for example the test called "inpos.c" which I added
when
fixing another input/output positions related bug in 2012. See
//www.freelists.org/post/liblouis-liblouisxml/inputPos-fixes and

https://github.com/liblouis/liblouis/commit/b4d54e2f1adee12806211d2584475554c807e786
.




Kind regards,

/Bert






2015-10-29 13:48 GMT+01:00 Arend Arends <mada73bg@xxxxxxxxxx>:

I have just made a pull request for liblouis/transcommon.ci with the
description:

“trace_translate, the general translation function in
lou_translateString.c
calls translatePass for pass2, 3 and 4. translatePass does currently
not
update the output positions. This modification tries to handle this.
This
may affect cursor position too.”.

Although the header of the file suggests that it is an include file for
both
forward and backward translation (lou_translateString.c and
lou_backTranslateString.c), it is only used in lou_translateString.c.
It
seems that most of the relevant contents has been copied to
lou_backTranslateString.c directly without the use of a separate
include
file. This has been the case since GitHub was first used (2013?). Maybe
we
should also integrate transcommon.ci in lou_translateString.c
directly?

The reason that I made this change was that the LibLouis function
lou_translate provides parameters for inputPos, outputPos and
cursorPos,
but
that these arrays and single value for cursorPos are only updated in
the
first (normal) pass of the translation, which calls the internal
function
translateString. In case of multi passes the function translatePass is
called (pass2, 3 and 4), but it makes no changes in the outputPos array
(pointed to by a local static pointer outputPositions). Input
positions
and
the cursor position are to some extent derived from the output
positions
after the translation is made.

These parameters can be used by a screenreader. In TactileView, the
application that I am working on, the output positions are used to
synchronize characters in braille and text in a graphic document
(especially
when printing to swell paper or ViewPlus printers). It is also used to
mark
braille indicators in a special way, for which it was necessary to
solve
the
current problem with the output positions. Especially the rules for
capitalization in Dutch braille tables gave a mismatch, because in
these
tables extensive use is made of context and multi pass rules.

Unfortanately I could solve this only for forward translation, and it
may
not even be perfect. I did not investigate what was needed for backward
translation, particularly because I do not use it and also do not have
experience with backward translation.

Arend Arends


For a description of the software, to download it and links to
project pages go to http://liblouis.org


For a description of the software, to download it and links to
project pages go to http://liblouis.org


For a description of the software, to download it and links to
project pages go to http://liblouis.org

For a description of the software, to download it and links to
project pages go to http://liblouis.org

Other related posts: