May be how LibLouis currently works would lead to what you say in the
example, although I would say not necessarily has to be that way.
For the result you said of @1-4 it would require that pass2 and
assumably pass3 and pass4 mutate the input rather than copying to a new
string. If output is separate to input and rules in that pass only match
against the input, then the rule:
pass2 @3-2 @4
could never match in a input string of @1-3.
Further to this even under a mutating the input model, I still do not
see how the result you suggest could occur if applying the rules as I said.
Cursor at index=0, neither pass2 rule matches (remember the content of
the replacement brackets must be a @1). Therefore do not modify and
advance cursor by one.
Cursor at index=1, again neither match. The second does not match
because there is no @2 following the @3 in the input yet. Advance the
cursor.
Cursor at index=2, the pass2 @1-3[] @2 matches because whilst the focus
is nothing it is preceeded by @1-3. Insert @2 at the position of the
replacement brackets, input now is @1-3-2. Cursor does not advance.
Cursor at index=2, unfortunately we end up in a loop due to @1-3[] still
matching. The other rule of @3-2 will not match because the inferred
brackets would say that is really [@3-2] and the cursor is after @3.
Yes though with the mutating input model you could do some interesting
things along the lines of what you said, rules like:
pass2 @1-3[] @2
pass2 @3[@2] @4
Would get @1-3-4. Also this pairing gets you out of the looping
situation by that second one becoming enabled.
What this though has highlighted is that my suggestion would still have
the possibility of getting stuck in loops. In fact my suggestion would
mean empty replacement brackets [] will always cause a endless loop, but
I guess that could easily be checked and an error could be given.
It definitely would require that @56* in a third column would have to be
accepted as without it some rules might be impossible.
Michael Whapples
On 06/01/2017 16:22, Bert Frees wrote:
One problem with your proposed alternative behavior is that if you have the following two rules:
pass2 @1-3[] @2
pass2 @3@2 @4
and the input string to the second pass is @1-3, then it would result in @1-4. In other words, a replacement string is processed again in the same pass, which in a way is also unintuitive. With the current behavior you can't have that.
Regarding the documentation: yes it could be more precise and comprehensive. Any volunteers?
2017-01-06 16:06 GMT+01:00 Michael Whapples <dmarc-noreply@xxxxxxxxxxxxx <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:
Well OK may be it is sort of said in the documentation but its not
as clear as it could be and the significance I think gets lost
amongst everything else in there.
My thought of it not being as clear as it could be is to refer to
the replaced text is where the terminology is may be not precise
enough.
Using these two rules (slightly modified from the past):
pass2 @1-3[] @2
pass2 []@1-3 @2
In the first case the @1-3 is copied to the output by the rule.
Whilst may be not being modified and so technically not being
replaced, it is still being handled. May be wrongly I had just
taken the term replaced text to mean handled text. If somehow this
could be modified to emphasise that it is after the closing
replacement bracket this might help.
Now moving to the second rule, I then take it to be that the @1-3
would also be handled and copied by the rule, not so. This is
where it becomes unintuitive, stuff before [] is handled but stuff
after is not.
In fact if anything were to be changed I would actually go with
the current cursor position relating to the opening replacement
bracket [ and anything before it is searched back from the cursor.
My reasons are:
1. This is how regular expressions work, may as well work like
other systems people may be familiar with.
2. Interaction of rules would be more consistant. If a table had
these two rules:
pass2 @1-3[] @2
pass2 @3 @4
and give a string of @1-3 to the second pass we currently would get:
@1-3-2
The pass2 @3 @4 rule does not get applied. If though the table had
these two rules:
pass2 []@1-3 @2
pass2 @3 @4
We currently get @2-1-4. So this did allow the pass2 @3 @4 rule to
be applied. If doing the change as I said then in both cases the
pass2 @3 @4 rule would be applied.
I am not saying we must change it, after all that could be some
work to ensure tables still work correctly. However if a change is
going to be made then this would be my preferred option.
In the meantime though may be the documentation could highlight
the significance of where the cursor is placed when context or
multipass rules are applied.
Michael Whapples
On 06/01/2017 13:01, Bert Frees wrote:
This is all in the documentation, although maybe not in so many
words. See the last paragraph of "2.11 The Context and Multipass
Opcodes".
As far as I remember Christian just quoted me from an email in
which I was actually asking about the inner workings myself, long
time ago (I looked it up, 2010). What I wrote then was just what
I was guessing based on experimentation, John said it was correct
and Christian just copied it to the documentation. After reading
it again, I don't think it's 100% accurate though.
I get what you are trying to say with your example. I'm not sure
what the reason is for not advancing the cursor in the second
case. I guess it's in order to be able to more in a single pass.
It's indeed not super-intuitive.
The important question is: are there cases in which we want to
advance to the end of the entire match, not just to the end of
the square brackets? The answer is yes: see for example the
"pass2 []%englishLetter. @56" case. So the next question is: are
there any cases where this can't be be solved with the asterisk?
That is, instead use "pass2 %englishLetter. @56*", or more
general, convert "pass2 []<x> <y>" to "pass2 <x> <y>*".
If there are no such cases, I wouldn't touch the algorithm. If
there are such cases, we should try to find a solution.
2017-01-06 12:35 GMT+01:00 Michael Whapples
<dmarc-noreply@xxxxxxxxxxxxx <mailto:dmarc-noreply@xxxxxxxxxxxxx>>:
Thank you that has actually made it very clear on why some
rules with [] work when others get into the loop.
It seemed a bit odd that something before [] would advance
the cursor when something after does not. Take two rules like:
pass2 @1[] @1
pass2 []@1 @1
Why should the first rule advance the cursor when the second
does not.
I understand from you explaining how the internals work, but
without that internal workings knowledge it does not seem
logical or intuitive that before and after are handled
differently.
May be this also could be a documentation improvement, add
something which states that the cursor will be moved to the
position just after the replacement brackets when a rule is
applied.
Another small note, it might be worth being extremely precise
in terminology here. The term "replacement" might mean the
brackets [] or it may mean what you are replacing it with. So
may be for the brackets [] refer to them as the "replacement
brackets" or the "replacement group".
Michael Whapples
On 06/01/2017 09:33, Bert Frees wrote:
Yes, it is like other translation rules. However, with both
multipass and other translation rules, it is not the first
match that is used, but rather the best match. Only one
matching rule is used, the rest is ignored. Processing
resumes at the first character after the replacement. This
means that if the replacement starts at offset 0 and has
length 0, the processing resumes at the same place which
results in an endless loop.
2017-01-06 8:09 GMT+01:00 Dave Mielke <dave@xxxxxxxxx
<mailto:dave@xxxxxxxxx>>:
I'm having trouble understanding when a multipass opcode
(e.g. pass2) moves on
to the next character. It doesn't seem to be like
translation rules where the
first one that matches is used, the rest are skipped,
and processing resumes at
the next character after the replacement.
Are they all always processed, or is the first one that
matches the only one
that's processed?
Where does processing resume after a match?
--
Dave Mielke | 2213 Fox Crescent | The Bible is
the very Word of God.
Phone: 1-613-726-0014 <tel:1-613-726-0014> | Ottawa,
Ontario | http://Mielke.cc/bible/
EMail: Dave@xxxxxxxxx <mailto:Dave@xxxxxxxxx> | Canada K2A 1H7 | http://FamilyRadio.org/
For a description of the software, to download it and
links to
project pages go to http://liblouis.org