[liblouis-liblouisxml] word_reset as opcode, was: RE: Re: Chat between APH and Nordic Braille in DAISY Pipeline 2 project

From: Davy Kager <DavyKager@xxxxxxxxxx>
To: "'liblouis-liblouisxml@xxxxxxxxxxxxx'" <liblouis-liblouisxml@xxxxxxxxxxxxx>
Date: Tue, 30 Jun 2015 12:34:53 +0000

Hi,

The following is related to the changes to capitals and emphasis[1] as Michael
Gray described them a few months ago.

I was thinking about an opcode to specify which characters do not reset a word.
Currently, everything that isn't a letter will break a word. As far as I can
tell this is handled in a single function resolveEmphasisResets() that is
called only for caps. Thus, the behavior for emphasis is slightly different.

As an example, the word E.T.A. should have one capsign in front of it because
the period doesn't cancel caps. This doesn't work currently, however it does
work for emphasis, presumably because word_reset isn't handled the same way.

This conflicts with the Dutch braille standard because:
1. As Bert Frees pointed out[2] not all characters cancel a cap or
emphasis sign.
2. In Dutch caps and emphasis are handled the same with regards to
word_reset.

Since most characters cancel a cap or emphasis sign, it seems logical to
instead define an opcode for characters that do not cancel these signs.
Initially I was thinking of adding an opcode wordmodechars with very similar
behavior to numericmodechars. This would handle both caps and emphasis. But
that will not be correct for UEB if Michael's code is any indication.

So some alternative solutions are:
1. Have two opcodes, capsmodechars and emphmodechars.
2. Have many opcodes *modechars, where * is caps, ital, bold, etc.

I'm leaning towards (2). In either case, it looks like resolveEmphasisResets()
needs to be called for emphasis as well. This should be fine as long as that
function doesn't have any hard-coded logic for splitting words based on
character attributes. To my knowledge at the moment the function considers any
non-letters to mean a word reset.

I don't want to break UEB with this change though, so I am putting the proposal
up here for discussion. Any feedback would be appreciated. For instance, what
do other languages need? Is nocapsmodechars indeed preferable over
capsmodechars (i.e. the exact opposite)?

Davy

References:
1.
//www.freelists.org/post/liblouis-liblouisxml/CapitalEmphasis-update
2.
//www.freelists.org/post/liblouis-liblouisxml/CapitalEmphasis-update,1

-----Oorspronkelijk bericht-----
Van: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] Namens Bert Frees
Verzonden: donderdag 25 juni 2015 16:05
Aan: liblouis-liblouisxml@xxxxxxxxxxxxx
Onderwerp: [liblouis-liblouisxml] Re: Chat between APH and Nordic Braille in
DAISY Pipeline 2 project

Christian Egli writes:

Procedure
~~~~~~~~~

- Create a branch in the master repo so that everybody can work off
the same branch and integrate via pull requests

Christian has copied Mike's branch to the main liblouis Github repository. The
name of the branch is "feature/ueb_update". Please all work together on this
branch from now on using pull requests.

typeform support for harness tests
----------------------------------

Bert will look into the harness test to see whether it can be made to
support typeforms

This is not at the top of my to do list as we will rely on the lou_compare tool
for the time being and I have other priorities. I imagine the syntax would look
something like this: (i.e. very similar to the syntax used in lou_compare)

input: "foobar",
typeform: {
bold: " +++",
ital: "++++++"
}

Opcode unification
------------------

- There is a proposal to merge all the emphasis opcodes into a few
generic ones.
- Bert will post a mail to the list with the proposal for further
discussion
- This is quite a sweeping change but it would be good to combine this
with the UEB changes

Basically the proposal is to make abstraction of the existing typeform
categories and to "unify" each set of opcodes (e.g. firstwordital,
firstwordbold, etc.) into a single generic opcode (e.g. firstwordemph).

There were various idea's at the meeting on how to achieve this same goal (I
think) syntax-wise. There was an idea about clustering rules into blocks, but
that would require a whole new way of parsing and compiling tables. There was
also an idea (from Keith iirc) about namespaced opcodes, which in essence I
think comes down to the same idea as mine.

My proposal, including the strategy behind it, is explained in detail in this
Github thread: https://github.com/liblouis/liblouis/issues/99 (I'm not going to
repeat it all here).

It would indeed be very nice if we could include this in the next release,
however if it can't be done in this timeframe then so be it, we shouldn't delay
things another 3 months.

Backwards compatibility
-----------------------

Another thing suggested on the call was a "version" opcode (I think it was
Ken). Newer versions of liblouis could require tables to define a version
higher than a certain value, or abort otherwise.

I know some other people that would be happy with such an opcode.
For a description of the software, to download it and links to project pages go
to http://liblouis.org
DISCLAIMER:
De informatie verzonden met dit e-mail bericht is uitsluitend bestemd voor de
geadresseerde. Indien u niet de beoogde geadresseerde bent, verzoeken wij u
vriendelijk dit aan de afzender te melden (of via:
info@xxxxxxxxxx<mailto:info@xxxxxxxxxx>) en het origineel en eventuele kopieën
te verwijderen.

The information sent in this e-mail is solely intended for the individual or
company to whom it is addressed. If you received this message in error, please
notify the sender immediately (or mail to
info@xxxxxxxxxx<mailto:info@xxxxxxxxxx>) and delete the original message and
possible copies.

Follow-Ups:
- [liblouis-liblouisxml] Re: word_reset as opcode
  - From: Bert Frees

[liblouis-liblouisxml] word_reset as opcode, was: RE: Re: Chat between APH and Nordic Braille in DAISY Pipeline 2 project

Other related posts: