[liblouis-liblouisxml] Re: Questions over correct opcode

From: "Michael Whapples" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "mwhapples@xxxxxxx" for DMARC)
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Date: Wed, 06 May 2015 13:10:48 +0100

Thanks for those answers.

In my view input should be deemed to be inbuf and liblouis should not modify that specific memory location. Copying to another buffer and modifying that as part of the translation process is fine, its part of the process to get to the translation.

Therefore I would say that index values are correct when they map between inbuf as passed in and outbuf as retrieved. If they map the temporary buffers then they would be wrong as external applications would not have that string.

Michael Whapples

On 06/05/2015 12:54, Bert Frees wrote:

1. I think your description is correct. The correct opcode indeed modifies the
input, not sure if the result is written back to the input buffer, but it
shouldn't. The manual says that inlen should have the *maximum* input length
but that really doesn't make sense. You need the exact input length
otherwise
you are translating garbage.
2. I agree it shouldn't.

3. I'm guessing yes. Looking at the source, inlen appears to be changed at the
end of the function makeCorrections, i.e. after the 'correct' rules are
applied.

4. OCR corrections are one use, but not the only I think. I agree OCR
corrections could be done at some other stage prior to braille
translation. But there could be other uses for the opcode. I'm not in favour
of deleting it.

5. No it shouldn't.

6. Yes, index values should work. If I remember correctly it didn't work before
and then I fixed it. Now thinking about it, if the input buffer is actually
changed, and the index values are supposed to give you the mapping between
the output buffer and the *changed* input buffer, then it was correct before
my fix and I broke it. But IMO that shouldn't be the interpretation of the
index values.

Bert

Michael Whapples writes:

Hello,
Having encountered some issues with the UEB table in Mike Gray's branch
of LibLouis, I now have some questions on exactly what the correct
opcode does and when the correct opcode should be used.

The particular rule which caused issue was:

correct ["…"] "..."

Please don't look for this in the standard liblouis tables, it is only
in Mike Gray's branch.

We were actually finding that this could actually lead to a crash of the
JVM when using LibLouis through JLouis when strings contained the \x2026
character.

What has been explained to me is that when the correct opcode is used,
the input buffer is copied and modified, and then at the end written
back to the original input buffer. Due to the above rule the input
buffer size increases, and potentially liblouis is writing over and
corrupting other data, potentially more noticable when calling from Java
as the objects involved do have additional data which may come
immediately after the array values.

So here are my specific questions:
1. Is the above description correct? If not could I have a detailed
description.
2. Should inbuf ever be written to, after all in lou_translate and
lou_translateString it is marked as const widechar* inbuf.
3. Although inbuf may be const, I notice inlen is not and the user guide
says that it will be set to the actual input size at the end. What does
this mean if inbuf does not change?
4. What are the correct uses for the correct opcode? I know the manual
refers to OCR engine corrections, any other uses which are appropriate?
5. Should the correct opcode always lead to a reduction in length? In
the user guide all the examples are reducing the size, but the guide
does not state that this is the way rules must go.
6. What affect does the correct opcode have on index values?

Finally a small comment, if the OCR correction thing is the only real
use, then I would possibly argue that the correct opcode does not really
belong in liblouis, it would be best that this is passed through a
cleanup tool first (even a simple shell script probably could do what
the OCR examples do). I don't really see how preprocessing the text is
related to Braille translation.

For a description of the software, to download it and links to
project pages go to http://liblouis.org

For a description of the software, to download it and links to
project pages go to http://liblouis.org

Follow-Ups:
- [liblouis-liblouisxml] Re: Questions over correct opcode
  - From: John J. Boyer

References:
- [liblouis-liblouisxml] Questions over correct opcode
  - From: Michael Whapples
- [liblouis-liblouisxml] Re: Questions over correct opcode
  - From: Bert Frees

[liblouis-liblouisxml] Re: Questions over correct opcode

Other related posts: