[liblouis-liblouisxml] Re: Questions over correct opcode

  • From: "John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Wed, 6 May 2015 06:29:17 -0500

Hi Michael,

The description of how the correct opcode works that you were given is
incorrect. A temporary bufer is set up. inbuf is then procesed while
aplying the corrections and the result is placed in the temporary
buffer. inbuf is never modified. Translation then proceeds from the
temporary buffer. The length of the temporary buffer is set to the
largest of inlen or outlen. If the corrections cause it to become longer
than this length it could overwrite other things. The same is true of
the temporary buffers used for the multipass opcodes. liblouisutdml gets
around this by just using a very large inbuf.

The correct opcode is used in the tables for UK maths and Marburg math.
It may also be needed for UEB math.

The correct opcode messes up indexing and also does funny things to
emphasis. It was introduced before indexing. Its purpose was originally
to provide an easy way to fix up messy input. It should not be used
unless there is no other way to achieve good results. When it is needed,
though, it is needed.

When liblouis returns inlen is set to the length of the amount of inbuf
actually translated. This may be smaller than the original value if
outbuf is too small. There may be other reasons that I don't remember at
the momment.

I think I have answered all of your questions.

John

On Wed, May 06, 2015 at 10:54:58AM +0100, Michael Whapples wrote:

Hello,
Having encountered some issues with the UEB table in Mike Gray's branch of
LibLouis, I now have some questions on exactly what the correct opcode does
and when the correct opcode should be used.

The particular rule which caused issue was:

correct ["…"] "..."

Please don't look for this in the standard liblouis tables, it is only in
Mike Gray's branch.

We were actually finding that this could actually lead to a crash of the JVM
when using LibLouis through JLouis when strings contained the \x2026
character.

What has been explained to me is that when the correct opcode is used, the
input buffer is copied and modified, and then at the end written back to the
original input buffer. Due to the above rule the input buffer size
increases, and potentially liblouis is writing over and corrupting other
data, potentially more noticable when calling from Java as the objects
involved do have additional data which may come immediately after the array
values.

So here are my specific questions:
1. Is the above description correct? If not could I have a detailed
description.
2. Should inbuf ever be written to, after all in lou_translate and
lou_translateString it is marked as const widechar* inbuf.
3. Although inbuf may be const, I notice inlen is not and the user guide
says that it will be set to the actual input size at the end. What does this
mean if inbuf does not change?
4. What are the correct uses for the correct opcode? I know the manual
refers to OCR engine corrections, any other uses which are appropriate?
5. Should the correct opcode always lead to a reduction in length? In the
user guide all the examples are reducing the size, but the guide does not
state that this is the way rules must go.
6. What affect does the correct opcode have on index values?

Finally a small comment, if the OCR correction thing is the only real use,
then I would possibly argue that the correct opcode does not really belong
in liblouis, it would be best that this is passed through a cleanup tool
first (even a simple shell script probably could do what the OCR examples
do). I don't really see how preprocessing the text is related to Braille
translation.

Michael Whapples
For a description of the software, to download it and links to
project pages go to http://liblouis.org

--
John J. Boyer; President,
AbilitiesSoft, Inc.
http://www.abilitiessoft.org
Madison, Wisconsin USA
We develop software for people with disabilities which is abailable at
no cost.

For a description of the software, to download it and links to
project pages go to http://liblouis.org

Other related posts: