[liblouis-liblouisxml] Re: Difficulty with the context opcode.

  • From: "Michael Whapples" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "mwhapples" for DMARC)
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Wed, 4 Jan 2017 13:07:04 +0000

May be I misunderstood the original question as posted here: //www.freelists.org/post/liblouis-liblouisxml/Difficulty-with-the-context-opcode

In that there were some rules listed using ` and in the text it was said the beginning of a sequence of letters. It seemed logical from that to take it that this must match only at the beginning of the input string.


As for bugs in the context and multipass opcodes, it was just jaw dropping after having been writing emails to the list on that topic to get told that it is believed there is no bugs in them. It had the appearance of denial of bugs possibly existing (an interesting way to have code bug free, just deny their existence).


I will try and get my bug reports into github, but using the web interface of github is not as easy as email. Is there a better interface to github for reporting bugs (eg. at APH RT is used for tickets and that has an email interface, on bitbucket wiki pages are just files in a repository, etc).


As for documentation updating, I would not want to just go in and update it to match how things currently work as I cannot be certain whether it is a bug or intended. Using the @56* third column as an example, I am unsure whether we have reached a conclusion or not on that discussion, how should one know when its ready to insert into documentation.


Finally on the documentation thing, texi is of zero interest to me now, therefore I could not actually make the changes in the files. How could I submit documentation updates?


Michael Whapples


On 04/01/2017 12:35, Bert Frees wrote:

But the original question did not say anything about "the first character of a string" or did I miss something?

The issue with @56* is something we should fix I think, even though John said it shouldn't be used. I don't see a reason why it shouldn't work.

Your issues do get observed. I try to look at the mailing list from time to time and move any reported issues to the Github tracker. I don't work over Christmas though, so I didn't see your latest reports yet. Also it's quite possible that I miss stuff because I only read half of the emails because I simply don't have enough time anymore.

The best way to report issues is via Github (https://github.com/liblouis/liblouis/issues <https://github.com/liblouis/liblouis/issues>). Preferably make an issue in Github, and then also send a message to the mailing list so that maybe a conversation is started.

Regarding you comment about documentation, intended/expected behavior, and actual behavior being changed without notice: The only way to really know how things are supposed to work is to ask John to cover every detail in the documentation. But we can't ask that from him. It's the responsibility of all of us to improve the documentation. If we're unsure what the expected behavior is and it is not documented, I think the best thing we can do is to document it the way we think it should behave, and then possibly fix the code. By covering all new code changes with tests, we reduce the changes that the behavior will change without notice.


Bert




2017-01-04 12:50 GMT+01:00 Michael Whapples <dmarc-noreply@xxxxxxxxxxxxx <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:

    Firstly I think your suggested rule of:

    pass2 %englishLetter. @56*

    Is not correct to solve the question as it needs to be only for
    the first character of a string, a ` prefix would at least be
    needed. Also see later notes about my issues with @56* and not
    being able to recommend it.


    Well I have been writing various bugs to this list over the last
    few days/weeks. Do bug reports to the list not get observed? Where
    should bug reports go to be noted?


    1. @56* does not always work. It seems to work in pass2 but not in
    context. Context when it is applied I get a space instead of the
    content * would copy.

    2. The use of grouping characters seems to differ depending on
    removal or replacement. For a rule like:

    pass2 [{mygroup]@1}mygroup ?

    Both the opening and closing grouping characters are removed. A
    rule like:

    pass2 [{mygroup]@1}mygroup @56

    Would only lead to the opening grouping character being replaced,
    the closing one will remain in the translation.

    3. The classes defined by $ are not always applied in pass2 rules:

    math \xf32e @12e

    pass2 [@12e-36]$d @36-3456 # Rule1

    pass2 @12e ? # Rule2

    For a string like:

    \xf32e-3

    Whilst I would expect the rule I gave the comment # Rule1 to be
    applied, I find # Rule2 is applied. I expect # Rule1 because it
    has the longer match. I conclude it is the $d at fault because a
    rule like:

    pass2 [@12e-36]@25 @36-3456

    will be applied. The documentation does not say that $ classes
    cannot/should not be used in pass2 rules.


    I could go on but here seems not the place.

    My point though is that the documentation is so slim one relies on
    undocumented stuff and so would it be considered a bug if it just
    changes without notice? As an example you say about using @56* but
    the documentation does not say what happens when doing this and my
    observation is that it is not the same for context and pass2. In
    fact in an earlier mail John said that @56* should not be used, *
    or ? should be the only thing in the third column when they are used.

    Michael Whapples


    On 04/01/2017 10:56, Bert Frees wrote:
    Multipass opcodes aren't that difficult. I don't know of any
    bugs, but it is possible that there are some, I haven't used
    things like "*", "_" and "!" much. The issue with the zillion
    (256) dots 56 isn't a bug. You just end up in a loop because
    you're starting with the empty brackets, so in the next iteration
    you don't move forward. 256 is a hard limit apparently, we should
    probably throw an error if we reach that number.

    Anyway, this seems to work for me:

    pass2 %englishLetter. @56*


    Can you try that?



    2017-01-04 11:19 GMT+01:00 Michael Whapples
    <dmarc-noreply@xxxxxxxxxxxxx <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:

        I am aware that no answer yet has been given to your original
        question. Having done a bit more work using context and
        multipass opcodes for some tables I am working on, they
        really seem to be questionable in how these rules work and I
        feel just unreliable opcodes. The documentation is so slim
        and there are so many cases beyond the documentation
        specification one is reliant on undocumented and probably
        undefined behaviour, who knows if it will change in the
        future without notice.


        I think I now understand why Mike Gray decided to create the
        match opcode to replace these. I am not sure if that match
        opcode has been included into the standard liblouis or if it
        is still an APH specific feature. I am not sure if he added
        details of the match opcode to the documentation but here is
        a link to an old mailing list post where the match opcode was
        described
        //www.freelists.org/post/liblouis-liblouisxml/new-opcodes
        <//www.freelists.org/post/liblouis-liblouisxml/new-opcodes>

        Also here is a link to the issue for merging the
        documentation for the match opcode
        https://github.com/liblouis/liblouis/pull/189/files
        <https://github.com/liblouis/liblouis/pull/189/files>


        May be this will offer a possible solution.


        Michael Whapples


        On 02/01/2017 16:30, Dave Mielke wrote:

            [quoted lines by Michael Whapples on 2017/01/02 at 10:38
            +0000]

                OK, I wasn't certain and now you mention getting
                repeated dots 56 for
                that first rule I think I had a similar issue when
                creating a
                different rule.

            For that case (empty brackets), lou_trace gives me:

                1.      lowercase       a       1
                2.      context `[]$w   @56
                3.      lowercase       a       1
                4.      context `[]$w   @56

            And so on. The log made it easier to count. It looped 256
            times.

                I have done a bit more looking at it and my original
                suggestion of
                @56* was wrong, it appears that can be used in the
                third column. So
                yes your original suggestion looks correct.

            This is what lou_trace gives for my original method
            (class name within
            brackets, and @56*):

                1.      lowercase       a       1
                2.      context `[$w]   @56*
                3.      lowercase       b       12
                4.      lowercase       c       14
                5.      space           0
                6.      lowercase       d       145
                7.      context _!$l[$w]        @56*
                8.      lowercase       e       15
                9.      lowercase       f       124


        For a description of the software, to download it and links to
        project pages go to http://liblouis.org





Other related posts: