[liblouis-liblouisxml] Re: [patch] Correct input to output position mapping for removed repeated text
- From: James Teh <jamie@xxxxxxxxxxx>
- To: liblouis-liblouisxml@xxxxxxxxxxxxx
- Date: Wed, 17 Sep 2008 20:29:07 +1000
Hi John,
Arrrg! I discovered that the patch I sent causes a regression. With
compbrlAtCursor enabled, doing something like:
"a a"
with cursorPos set to 2 causes the second "a" to disappear. I suspect
this is because my call to for_updatePositions could also call
doCompTrans, which probably isn't a good idea there. I have written code
to manually update outputPositions and cursorPos instead.
However, this raises an interesting question about one of Eitan's test
cases. Using en-us-g2, if we take the string:
"greetings "
(Note the double space at the end.)
Position 9 is the first space, position 10 is the second space. If one
translates with compbrlAtCursor and the cursorPos set to 10 (the second
space), should the resulting output produce two spaces (because the
second space was expanded) or just one? Eitan's test expects that there
should be two. However, this is tricky because the elimination of
repeats doesn't take computer braille start into account. It probably
really should - for spaces, this isn't a big problem, but for other
characters, it may well be.
I'm attaching the updated patch. Sorry about that. Note that this patch
does the former (does not include the repeated space). I'm not quite
sure how to handle computer braille start with this without breaking
future translation.
Jamie
John J. Boyer wrote:
Jamie,
Thanks for the patch. I will apply it to my development copy.
Unfortunately, it is not the same as the copy in cood.google. I've been
experimenting and testing. I just fixed the problem with dots 7-8 in
tables like en-us-comp8.ctb. now, if your table does not contain the
capsign opcode but does define dot 7 for capital letters, it will be
preserved. You will have to obtain the new liblouis and liblouisxml from
www.jjb-software.com . Sorry to be out of phrase with the Google site.
I'm putting the finishing touches on the next release. You will receive
a message on the list when it is available.
John
On Wed, Sep 17, 2008 at 01:04:16PM +1000, James Teh wrote:
Hi all,
Attached is a patch to do the following:
* liblouis/lou_translateString.c: When handling input which is removed
due to a "repeated" opcode, correctly update the mapping from input to
output positions; i.e. the outputPos array and the returned cursorPos.
The test case is a string such as:
"a"
Translating this with en-us-g2.ctb yields:
"a"
Observe that the second space has been eliminated due to the "repeated"
opcode for spaces. Previously, the mapping from input to output
positions was:
[0, 1, 0]
This is incorrect; this indicates that position 2 of the input maps to
position 0 of the output.
It should be:
[0, 1, 1]
indicating that both position 1 and 2 of the input map to position 1 of
the output. Similar issues exist with cursorPos. This patch fixes both.
Note that the patch passes a string to for_updatePositions, but passes
the output length as being 0, so the string is unnecessary. However,
passing memcpy a NULL src is undefined according to the spec (though
most implementations don't complain).
I have already committed this to the svn for the liblouis Google Code
project.
Jamie
--
James Teh
Email: jamie@xxxxxxxxxxx
WWW: http://www.jantrid.net/
MSN Messenger: jamie@xxxxxxxxxxx
Jabber: jteh@xxxxxxxxxx
Yahoo: jcs_teh
Index: liblouis/lou_translateString.c
===================================================================
--- liblouis/lou_translateString.c (revision 34)
+++ liblouis/lou_translateString.c (revision 35)
@@ -1891,6 +1891,7 @@
&& compareChars (&transRule->charsdots[0],
¤tInput[src], transCharslen, 0))
{
+ for_updatePositions(transRule->charsdots[0], transCharslen, 0);
src += transCharslen;
}
break;
--
James Teh
Email: jamie@xxxxxxxxxxx
WWW: http://www.jantrid.net/
MSN Messenger: jamie@xxxxxxxxxxx
Jabber: jteh@xxxxxxxxxx
Yahoo: jcs_teh
Index: liblouis/lou_translateString.c
===================================================================
--- liblouis/lou_translateString.c (revision 34)
+++ liblouis/lou_translateString.c (working copy)
@@ -1891,6 +1891,17 @@
&& compareChars (&transRule->charsdots[0],
¤tInput[src], transCharslen, 0))
{
+ if (outputPositions != NULL)
+ {
+ int tcc;
+ for (tcc = 0; tcc < transCharslen; tcc++)
+ outputPositions[tcc + k] = dest - 1;
+ }
+ if (!cursorStatus && src <= cursorPosition && cursorPosition <
src + transCharslen)
+ {
+ cursorStatus = 1;
+ cursorPosition = dest - 1;
+ }
src += transCharslen;
}
break;
- Follow-Ups:
- References:
Other related posts:
- » [liblouis-liblouisxml] [patch] Correct input to output position mapping for removed repeated text
- » [liblouis-liblouisxml] Re: [patch] Correct input to output position mapping for removed repeated text
- » [liblouis-liblouisxml] Re: [patch] Correct input to output position mapping for removed repeated text
- » [liblouis-liblouisxml] Re: [patch] Correct input to output position mapping for removed repeated text
Jamie, Thanks for the patch. I will apply it to my development copy. Unfortunately, it is not the same as the copy in cood.google. I've been experimenting and testing. I just fixed the problem with dots 7-8 in tables like en-us-comp8.ctb. now, if your table does not contain the capsign opcode but does define dot 7 for capital letters, it will be preserved. You will have to obtain the new liblouis and liblouisxml from www.jjb-software.com . Sorry to be out of phrase with the Google site. I'm putting the finishing touches on the next release. You will receive a message on the list when it is available. John On Wed, Sep 17, 2008 at 01:04:16PM +1000, James Teh wrote:
Hi all, Attached is a patch to do the following: * liblouis/lou_translateString.c: When handling input which is removed due to a "repeated" opcode, correctly update the mapping from input to output positions; i.e. the outputPos array and the returned cursorPos. The test case is a string such as: "a" Translating this with en-us-g2.ctb yields: "a" Observe that the second space has been eliminated due to the "repeated" opcode for spaces. Previously, the mapping from input to output positions was: [0, 1, 0] This is incorrect; this indicates that position 2 of the input maps to position 0 of the output. It should be: [0, 1, 1] indicating that both position 1 and 2 of the input map to position 1 of the output. Similar issues exist with cursorPos. This patch fixes both. Note that the patch passes a string to for_updatePositions, but passes the output length as being 0, so the string is unnecessary. However, passing memcpy a NULL src is undefined according to the spec (though most implementations don't complain). I have already committed this to the svn for the liblouis Google Code project. Jamie -- James Teh Email: jamie@xxxxxxxxxxx WWW: http://www.jantrid.net/ MSN Messenger: jamie@xxxxxxxxxxx Jabber: jteh@xxxxxxxxxx Yahoo: jcs_teh
Index: liblouis/lou_translateString.c
===================================================================
--- liblouis/lou_translateString.c (revision 34)
+++ liblouis/lou_translateString.c (revision 35)
@@ -1891,6 +1891,7 @@
&& compareChars (&transRule->charsdots[0],
¤tInput[src], transCharslen, 0))
{
+ for_updatePositions(transRule->charsdots[0], transCharslen, 0);
src += transCharslen;
}
break;