[liblouis-liblouisxml] SV: Re: SV: 8 dots contracted with caps was: are the swap opcodes broken?
- From: Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx>
- To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
- Date: Wed, 18 Jan 2017 21:17:39 +0100
Hi Bert,
Regarding your first "btw" I don't quite understand what the problem is.
Maybe you are overthinking it?
The problem is that the back-translator could apply contraction rules because
it does not know that it is in a no-contractions state. A German example would
be the letters that are also used as partword contractions, i.e. q, x, and y.
In Danish, we have similar letters: q, w, x, and z. If capsnocont is defined
and the back-translator sees a begcapsword, it knows that contraction rules
should not be applied. But if no begcapsword is used, it should react on seeing
two or more capital letters. Asimilar problem occurs with the nocont opcode
where a certain text string triggers the no-contractions state, e.g.
http://, ;
.txt, or .zip. Hope it makes sense.
Regarding your second btw, yes perhaps you are right. But in which category
fall words that are not fully uppercase, but also not only the first letter?
Hmm, good question. I don’t know about the rules for this in other languages,
but I would say that mixed caps should be treated like all caps. Otherwise, you
could have some very confusing combinations of contracted and uncontracted
braille within the same word. The alternative is to have three separate
opcodes: singlecapsnocont, mixedcapsnocont, and allcapsnocont. I think that
would be overkill, but of course I might be proven wrong. :)
Bue
Fra: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] På vegne af Bert Frees
Sendt: 18. januar 2017 12:08
Til: liblouis-liblouisxml@xxxxxxxxxxxxx
Emne: [liblouis-liblouisxml] Re: SV: 8 dots contracted with caps was: are the
swap opcodes broken?
Hi Bue,
Yes, the tests seem useful. I'm not sure about switching tables in the middle
of a test file, we have to ask Christian. Couldn't the old test framework do it?
Some time ago I also worked on an experimental new test tool that made it easy
to test a lot of different combinations of table rules. I wrote it mainly with
testing and documenting the new expected behavior of capitalization/emphasis
opcodes in mind.
Regarding your first "btw" I don't quite understand what the problem is. Maybe
you are overthinking it?
Regarding your second btw, yes perhaps you are right. But in which category
fall words that are not fully uppercase, but also not only the first letter?
That should be made clear in the documentation (and preferably also in the
opcode name).
2017-01-17 20:56 GMT+01:00 Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx
<
mailto:bue@xxxxxxxxxxxxxxxxxx> >:
Hi Bert,
If you find these tests useful, I will make more tests for other combinations.
Unfortunately, you apparently cannot switch tables or direction in the middle
of a test file, so it will be a good many files.
Btw: Testing backwards made me aware of a little snag: If capsnocont has been
defined, contraction rules should of course not be used when in capsword mode.
This should be easy enough when begcapsword/endcapsword are also defined.
However, if begcapsword/endcapsword are not defined, we have to assume a
capsword situation and activate capsnocont if capital letters or contractions
appear after each other.
Btw: according to the manual, capsnocont only affects all caps words, not words
with only the first letter capitalized. This is fine for the current purpose,
but I think there are languages where you cannot contract words with first cap
either. Until recently, this was the case in Danish 6 dots grade 2, but the
rules have been changed, so that it now behaves more like English in this
respect. Perhaps “allcapsnocont” would be a better name in respect to what it
does. If we then need an opcode to stop contraction of single caps, we could
use the name capsnocont. What do you say?
Bue
Fra: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
<
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx>
[
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx ;
<
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx> ] På vegne af Bert Frees
Sendt: 17. januar 2017 14:59
Til: liblouis-liblouisxml@xxxxxxxxxxxxx
<
mailto:liblouis-liblouisxml@xxxxxxxxxxxxx>
Emne: [liblouis-liblouisxml] Re: SV: Re: SV: Re: 8 dots contracted with caps
was: are the swap opcodes broken?
Making some tests would be a good start, yes!
2017-01-17 14:43 GMT+01:00 Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx
<
mailto:bue@xxxxxxxxxxxxxxxxxx> >:
Hi Bert,
So, where do we go from here?
It looks like quite a few things need some working over. Perhaps fixing
capsnocont and changing the behavior when begcapsword/encapsword are not
defined would be a good place to start, since the other things depend on this.
Perhaps, I should start out by making some tests, so that we know what results
we want to get.
Bue
Fra: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
<
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx>
[
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx ;
<
mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx> ] På vegne af Bert Frees
Sendt: 16. januar 2017 21:54
Til: liblouis-liblouisxml@xxxxxxxxxxxxx
<
mailto:liblouis-liblouisxml@xxxxxxxxxxxxx>
Emne: [liblouis-liblouisxml] Re: SV: Re: 8 dots contracted with caps was: are
the swap opcodes broken?
Answers inline:
2017-01-16 20:52 GMT+01:00 Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx
<
mailto:bue@xxxxxxxxxxxxxxxxxx> >:
Hi Bert,
Neither caps7 or capsletter add7 have been implemented. They were just my
thought about possible names for such an opcode.
Yes, of course. I never assumed that.
Yes, I would like to be able to use the
Uplow Aa 1-17
Syntax. That would also help with the hyphenation algorithm since it takes
small and capital letters into account if the relationship is known.
However, you still need to add dot 7 to chars that make a contraction, but
which are not letters, e.g. Ei 1467 and Ie 3467. If you were to do this
manually, your swap class might come in handy. But I still think that it should
be possible to do it automatically by adding or “or”ing dot 7 as capsletter
sign.
I understand that. I was just putting some ideas on the table on how things
could possibly be generalized.
The support for the uplow and the swapdd cases would be "in addition to" the
dot 7 support, not "instead of".
Do you have any idea if changing the behavior of not defining
begcapsword/endcapsord would be incompatible with any existing tables, e.g. the
UEB tables?
I don't think so, and if it is the case we can still create a new opcode that
implements the current behavior of "capsletter" (e.g. named "capssingleletter").
Other related posts: