[liblouis-liblouisxml] Re: Difficulty with the context opcode.

From: Bert Frees <bertfrees@xxxxxxxxx>
To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
Date: Wed, 4 Jan 2017 15:39:14 +0100

Last time I checked you can't create issues with a simple email. You could
write a little script that creates an issues from a text file, using the
Github's web API. I don't know if you would prefer that over the web
interface though. There is a command line tool written in Ruby that does it
for you: https://github.com/stephencelis/ghi. To create an issue just run
"ghi open".

I'm also reluctant to updating the documentation to match the behavior if I
don't understand the idea behind it. I think it's not a good approach
because it makes people think the behavior was intended while it might not
be the case. It could help others of course, but if you're just writing
down stuff from experience others can do the same, and I'd rather spend
time on other things. Like I said in previous email, it is better to think
about what the behavior SHOULD BE, and then adapt the code if needed.

I think something is ready to go into the documentation as soon as there is
more or less a consensus and, most importantly, someone who is willing to
actually do it, and if needed (find someone to) update the code and write
tests. It is clear that the one who does the work has more to say in
decisions.

Taking for example the "@56* third column" issue: if I'd have a little bit
more time, I would just go into the code, check whether it would be
possible to support, then update the code and documentation and make a pull
request. Then Christian would review and probably merge right away. If I
think something needs some discussion I'll first ask on the mailing list.
But in this case I think nobody will object if we would add support for
@56*.

What's wrong with texi? I mean, I know you don't like it, but is it really
impossible for you to use it? It's just a plain text file. Don't worry
about all these special things such as cross-links etc. The text itself is
by far the most important. If the format is really a roadblock for you and
makes you not want to update it, as far as I'm concerned you can just write
plain text files and ask Christian or me or someone else to integrate it.

Bert

2017-01-04 14:07 GMT+01:00 Michael Whapples <dmarc-noreply@xxxxxxxxxxxxx>:

May be I misunderstood the original question as posted here:
//www.freelists.org/post/liblouis-liblouisxml/Difficult
y-with-the-context-opcode

In that there were some rules listed using ` and in the text it was said
the beginning of a sequence of letters. It seemed logical from that to take
it that this must match only at the beginning of the input string.

As for bugs in the context and multipass opcodes, it was just jaw dropping
after having been writing emails to the list on that topic to get told that
it is believed there is no bugs in them. It had the appearance of denial of
bugs possibly existing (an interesting way to have code bug free, just deny
their existence).

I will try and get my bug reports into github, but using the web interface
of github is not as easy as email. Is there a better interface to github
for reporting bugs (eg. at APH RT is used for tickets and that has an email
interface, on bitbucket wiki pages are just files in a repository, etc).

As for documentation updating, I would not want to just go in and update
it to match how things currently work as I cannot be certain whether it is
a bug or intended. Using the @56* third column as an example, I am unsure
whether we have reached a conclusion or not on that discussion, how should
one know when its ready to insert into documentation.

Finally on the documentation thing, texi is of zero interest to me now,
therefore I could not actually make the changes in the files. How could I
submit documentation updates?

Michael Whapples

On 04/01/2017 12:35, Bert Frees wrote:

But the original question did not say anything about "the first character
of a string" or did I miss something?

The issue with @56* is something we should fix I think, even though John
said it shouldn't be used. I don't see a reason why it shouldn't work.

Your issues do get observed. I try to look at the mailing list from time
to time and move any reported issues to the Github tracker. I don't work
over Christmas though, so I didn't see your latest reports yet. Also it's
quite possible that I miss stuff because I only read half of the emails
because I simply don't have enough time anymore.

The best way to report issues is via Github (https://github.com/liblouis/l
iblouis/issues). Preferably make an issue in Github, and then also send a
message to the mailing list so that maybe a conversation is started.

Regarding you comment about documentation, intended/expected behavior, and
actual behavior being changed without notice:  The only way to really know
how things are supposed to work is to ask John to cover every detail in the
documentation. But we can't ask that from him. It's the responsibility of
all of us to improve the documentation. If we're unsure what the expected
behavior is and it is not documented, I think the best thing we can do is
to document it the way we think it should behave, and then possibly fix the
code. By covering all new code changes with tests, we reduce the changes
that the behavior will change without notice.

Bert

2017-01-04 12:50 GMT+01:00 Michael Whapples <dmarc-noreply@xxxxxxxxxxxxx>:

Firstly I think your suggested rule of:

pass2 %englishLetter. @56*

Is not correct to solve the question as it needs to be only for the first
character of a string, a ` prefix would at least be needed. Also see later
notes about my issues with @56* and not being able to recommend it.

Well I have been writing various bugs to this list over the last few
days/weeks. Do bug reports to the list not get observed? Where should bug
reports go to be noted?

1. @56* does not always work. It seems to work in pass2 but not in
context. Context when it is applied I get a space instead of the content *
would copy.

2. The use of grouping characters seems to differ depending on removal or
replacement. For a rule like:

pass2 [{mygroup]@1}mygroup ?

Both the opening and closing grouping characters are removed. A rule like:

pass2 [{mygroup]@1}mygroup @56

Would only lead to the opening grouping character being replaced, the
closing one will remain in the translation.

3. The classes defined by $ are not always applied in pass2 rules:

math \xf32e @12e

pass2 [@12e-36]$d @36-3456 # Rule1

pass2 @12e ? # Rule2

For a string like:

\xf32e-3

Whilst I would expect the rule I gave the comment # Rule1 to be applied,
I find # Rule2 is applied. I expect # Rule1 because it has the longer
match. I conclude it is the $d at fault because a rule like:

pass2 [@12e-36]@25 @36-3456

will be applied. The documentation does not say that $ classes
cannot/should not be used in pass2 rules.

I could go on but here seems not the place.

My point though is that the documentation is so slim one relies on
undocumented stuff and so would it be considered a bug if it just changes
without notice? As an example you say about using @56* but the
documentation does not say what happens when doing this and my observation
is that it is not the same for context and pass2. In fact in an earlier
mail John said that @56* should not be used, * or ? should be the only
thing in the third column when they are used.

Michael Whapples

On 04/01/2017 10:56, Bert Frees wrote:

Multipass opcodes aren't that difficult. I don't know of any bugs, but it
is possible that there are some, I haven't used things like "*", "_" and
"!" much. The issue with the zillion (256) dots 56 isn't a bug. You just
end up in a loop because you're starting with the empty brackets, so in the
next iteration you don't move forward. 256 is a hard limit apparently, we
should probably throw an error if we reach that number.

Anyway, this seems to work for me:

    pass2 %englishLetter. @56*

Can you try that?

2017-01-04 11:19 GMT+01:00 Michael Whapples <dmarc-noreply@xxxxxxxxxxxxx>
:

I am aware that no answer yet has been given to your original question.
Having done a bit more work using context and multipass opcodes for some
tables I am working on, they really seem to be questionable in how these
rules work and I feel just unreliable opcodes. The documentation is so slim
and there are so many cases beyond the documentation specification one is
reliant on undocumented and probably undefined behaviour, who knows if it
will change in the future without notice.

I think I now understand why Mike Gray decided to create the match
opcode to replace these. I am not sure if that match opcode has been
included into the standard liblouis or if it is still an APH specific
feature. I am not sure if he added details of the match opcode to the
documentation but here is a link to an old mailing list post where the
match opcode was described //www.freelists.org/post/
liblouis-liblouisxml/new-opcodes

Also here is a link to the issue for merging the documentation for the
match opcode https://github.com/liblouis/liblouis/pull/189/files

May be this will offer a possible solution.

Michael Whapples

On 02/01/2017 16:30, Dave Mielke wrote:

[quoted lines by Michael Whapples on 2017/01/02 at 10:38 +0000]

OK, I wasn't certain and now you mention getting repeated dots 56 for

that first rule I think I had a similar issue when creating a
different rule.

For that case (empty brackets), lou_trace gives me:

    1.      lowercase       a       1
    2.      context `[]$w   @56
    3.      lowercase       a       1
    4.      context `[]$w   @56

And so on. The log made it easier to count. It looped 256 times.

I have done a bit more looking at it and my original suggestion of

@56* was wrong, it appears that can be used in the third column. So
yes your original suggestion looks correct.

This is what lou_trace gives for my original method (class name within
brackets, and @56*):

    1.      lowercase       a       1
    2.      context `[$w]   @56*
    3.      lowercase       b       12
    4.      lowercase       c       14
    5.      space           0
    6.      lowercase       d       145
    7.      context _!$l[$w]        @56*
    8.      lowercase       e       15
    9.      lowercase       f       124

For a description of the software, to download it and links to
project pages go to http://liblouis.org

Follow-Ups:
- [liblouis-liblouisxml] Re: Difficulty with the context opcode.
  - From: Christian Egli
- [liblouis-liblouisxml] Re: Difficulty with the context opcode.
  - From: Michael Whapples

[liblouis-liblouisxml] Re: Difficulty with the context opcode.

Other related posts: