[liblouis-liblouisxml] Re: Difficulty with the context opcode.

From: "Michael Whapples" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "mwhapples" for DMARC)
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Date: Wed, 4 Jan 2017 15:35:43 +0000

I simply do not want to learn texi, there are other more useful and valuable syntaxes to learn. The problem with any markup language, no matter how simple or complicated, one needs to be aware of special characters and ensuring to escape them, so even text modification is not risk free.

Michael Whapples
On 04/01/2017 14:39, Bert Frees wrote:

Last time I checked you can't create issues with a simple email. You could write a little script that creates an issues from a text file, using the Github's web API. I don't know if you would prefer that over the web interface though. There is a command line tool written in Ruby that does it for you: https://github.com/stephencelis/ghi <https://github.com/stephencelis/ghi>. To create an issue just run "ghi open".

I'm also reluctant to updating the documentation to match the behavior if I don't understand the idea behind it. I think it's not a good approach because it makes people think the behavior was intended while it might not be the case. It could help others of course, but if you're just writing down stuff from experience others can do the same, and I'd rather spend time on other things. Like I said in previous email, it is better to think about what the behavior SHOULD BE, and then adapt the code if needed.

I think something is ready to go into the documentation as soon as there is more or less a consensus and, most importantly, someone who is willing to actually do it, and if needed (find someone to) update the code and write tests. It is clear that the one who does the work has more to say in decisions.

Taking for example the "@56* third column" issue: if I'd have a little bit more time, I would just go into the code, check whether it would be possible to support, then update the code and documentation and make a pull request. Then Christian would review and probably merge right away. If I think something needs some discussion I'll first ask on the mailing list. But in this case I think nobody will object if we would add support for @56*.

What's wrong with texi? I mean, I know you don't like it, but is it really impossible for you to use it? It's just a plain text file. Don't worry about all these special things such as cross-links etc. The text itself is by far the most important. If the format is really a roadblock for you and makes you not want to update it, as far as I'm concerned you can just write plain text files and ask Christian or me or someone else to integrate it.

Bert

2017-01-04 14:07 GMT+01:00 Michael Whapples <dmarc-noreply@xxxxxxxxxxxxx <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:

    May be I misunderstood the original question as posted here:

//www.freelists.org/post/liblouis-liblouisxml/Difficulty-with-the-context-opcode

<//www.freelists.org/post/liblouis-liblouisxml/Difficulty-with-the-context-opcode>

    In that there were some rules listed using ` and in the text it
    was said the beginning of a sequence of letters. It seemed logical
    from that to take it that this must match only at the beginning of
    the input string.

    As for bugs in the context and multipass opcodes, it was just jaw
    dropping after having been writing emails to the list on that
    topic to get told that it is believed there is no bugs in them. It
    had the appearance of denial of bugs possibly existing (an
    interesting way to have code bug free, just deny their existence).

    I will try and get my bug reports into github, but using the web
    interface of github is not as easy as email. Is there a better
    interface to github for reporting bugs (eg. at APH RT is used for
    tickets and that has an email interface, on bitbucket wiki pages
    are just files in a repository, etc).

    As for documentation updating, I would not want to just go in and
    update it to match how things currently work as I cannot be
    certain whether it is a bug or intended. Using the @56* third
    column as an example, I am unsure whether we have reached a
    conclusion or not on that discussion, how should one know when its
    ready to insert into documentation.

    Finally on the documentation thing, texi is of zero interest to me
    now, therefore I could not actually make the changes in the files.
    How could I submit documentation updates?

    Michael Whapples

    On 04/01/2017 12:35, Bert Frees wrote:

    But the original question did not say anything about "the first
    character of a string" or did I miss something?

    The issue with @56* is something we should fix I think, even
    though John said it shouldn't be used. I don't see a reason why
    it shouldn't work.

    Your issues do get observed. I try to look at the mailing list
    from time to time and move any reported issues to the Github
    tracker. I don't work over Christmas though, so I didn't see your
    latest reports yet. Also it's quite possible that I miss stuff
    because I only read half of the emails because I simply don't
    have enough time anymore.

    The best way to report issues is via Github
    (https://github.com/liblouis/liblouis/issues
    <https://github.com/liblouis/liblouis/issues>). Preferably make
    an issue in Github, and then also send a message to the mailing
    list so that maybe a conversation is started.

    Regarding you comment about documentation, intended/expected
    behavior, and actual behavior being changed without notice:  The
    only way to really know how things are supposed to work is to ask
    John to cover every detail in the documentation. But we can't ask
    that from him. It's the responsibility of all of us to improve
    the documentation. If we're unsure what the expected behavior is
    and it is not documented, I think the best thing we can do is to
    document it the way we think it should behave, and then possibly
    fix the code. By covering all new code changes with tests, we
    reduce the changes that the behavior will change without notice.

    Bert

    2017-01-04 12:50 GMT+01:00 Michael Whapples
    <dmarc-noreply@xxxxxxxxxxxxx <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:

        Firstly I think your suggested rule of:

        pass2 %englishLetter. @56*

        Is not correct to solve the question as it needs to be only
        for the first character of a string, a ` prefix would at
        least be needed. Also see later notes about my issues with
        @56* and not being able to recommend it.

        Well I have been writing various bugs to this list over the
        last few days/weeks. Do bug reports to the list not get
        observed? Where should bug reports go to be noted?

        1. @56* does not always work. It seems to work in pass2 but
        not in context. Context when it is applied I get a space
        instead of the content * would copy.

        2. The use of grouping characters seems to differ depending
        on removal or replacement. For a rule like:

        pass2 [{mygroup]@1}mygroup ?

        Both the opening and closing grouping characters are removed.
        A rule like:

        pass2 [{mygroup]@1}mygroup @56

        Would only lead to the opening grouping character being
        replaced, the closing one will remain in the translation.

        3. The classes defined by $ are not always applied in pass2
        rules:

        math \xf32e @12e

        pass2 [@12e-36]$d @36-3456 # Rule1

        pass2 @12e ? # Rule2

        For a string like:

        \xf32e-3

        Whilst I would expect the rule I gave the comment # Rule1 to
        be applied, I find # Rule2 is applied. I expect # Rule1
        because it has the longer match. I conclude it is the $d at
        fault because a rule like:

        pass2 [@12e-36]@25 @36-3456

        will be applied. The documentation does not say that $
        classes cannot/should not be used in pass2 rules.

        I could go on but here seems not the place.

        My point though is that the documentation is so slim one
        relies on undocumented stuff and so would it be considered a
        bug if it just changes without notice? As an example you say
        about using @56* but the documentation does not say what
        happens when doing this and my observation is that it is not
        the same for context and pass2. In fact in an earlier mail
        John said that @56* should not be used, * or ? should be the
        only thing in the third column when they are used.

        Michael Whapples

        On 04/01/2017 10:56, Bert Frees wrote:

        Multipass opcodes aren't that difficult. I don't know of any
        bugs, but it is possible that there are some, I haven't used
        things like "*", "_" and "!" much. The issue with the
        zillion (256) dots 56 isn't a bug. You just end up in a loop
        because you're starting with the empty brackets, so in the
        next iteration you don't move forward. 256 is a hard limit
        apparently, we should probably throw an error if we reach
        that number.

        Anyway, this seems to work for me:

        pass2 %englishLetter. @56*

        Can you try that?

        2017-01-04 11:19 GMT+01:00 Michael Whapples
        <dmarc-noreply@xxxxxxxxxxxxx
        <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:

            I am aware that no answer yet has been given to your
            original question. Having done a bit more work using
            context and multipass opcodes for some tables I am
            working on, they really seem to be questionable in how
            these rules work and I feel just unreliable opcodes. The
            documentation is so slim and there are so many cases
            beyond the documentation specification one is reliant on
            undocumented and probably undefined behaviour, who knows
            if it will change in the future without notice.

            I think I now understand why Mike Gray decided to create
            the match opcode to replace these. I am not sure if that
            match opcode has been included into the standard
            liblouis or if it is still an APH specific feature. I am
            not sure if he added details of the match opcode to the
            documentation but here is a link to an old mailing list
            post where the match opcode was described
            //www.freelists.org/post/liblouis-liblouisxml/new-opcodes
            <//www.freelists.org/post/liblouis-liblouisxml/new-opcodes>

            Also here is a link to the issue for merging the
            documentation for the match opcode
            https://github.com/liblouis/liblouis/pull/189/files
            <https://github.com/liblouis/liblouis/pull/189/files>

            May be this will offer a possible solution.

            Michael Whapples

            On 02/01/2017 16:30, Dave Mielke wrote:

                [quoted lines by Michael Whapples on 2017/01/02 at
                10:38 +0000]

                    OK, I wasn't certain and now you mention getting
                    repeated dots 56 for
                    that first rule I think I had a similar issue
                    when creating a
                    different rule.

                For that case (empty brackets), lou_trace gives me:

                    1. lowercase       a      1
                    2. context `[]$w  @56
                    3. lowercase       a      1
                    4. context `[]$w  @56

                And so on. The log made it easier to count. It
                looped 256 times.

                    I have done a bit more looking at it and my
                    original suggestion of
                    @56* was wrong, it appears that can be used in
                    the third column. So
                    yes your original suggestion looks correct.

                This is what lou_trace gives for my original method
                (class name within
                brackets, and @56*):

                    1. lowercase       a      1
                    2. context `[$w]  @56*
                    3. lowercase       b      12
                    4. lowercase       c      14
                    5.      space          0
                    6. lowercase       d      145
                    7. context _!$l[$w]       @56*
                    8. lowercase       e      15
                    9. lowercase       f      124

            For a description of the software, to download it and
            links to
            project pages go to http://liblouis.org

Follow-Ups:
- [liblouis-liblouisxml] Re: Difficulty with the context opcode.
  - From: Christian Egli

References:
- [liblouis-liblouisxml] Re: Difficulty with the context opcode.
  - From: Bert Frees

[liblouis-liblouisxml] Re: Difficulty with the context opcode.

Other related posts: