[liblouis-liblouisxml] Re: Capital/Emphasis update

  • From: Bert Frees <bertfrees@xxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Fri, 13 Feb 2015 21:43:02 +0100

Congradulations on the progress.

Forgive me if some of the following comments are not relevant anymore. In
another email you say that this description of behavior is no longer valid, but
it's not clear exactly how the behavior changed. So here it goes:

Michael Gray writes:

> If a single character is marked, the singleletter* symbol in inserted.
>
> If len*phrase is not set or 0, or the number of consecutive words marked is
> less than len*phrase, then all words are marked with the *word symbol if it is
> defined.  The *word symbol does not have to start at a beginning of the first
> word.  If the last word is not completely marked, the *wordstop symbol is
> inserted where the markings stop if it is defined.  If the first word has only
> the last character marked, or the last word has only the first character
> marked, then the singleletter* symbol is inserted instead.
>
> If the number of consecutive words marked is greater or equal to len*phrase,
> then the passage is marked.  If firstword* is not defined, then the
> firstletter* and lastletter* symbols are inserted accordingly.  If firstword*
> is defined, then the firstword*, lastwordbefore* or lastwordafter* symbols are
> inserted.  firstword* is inserted at the beginning of the first word and
> lastwordbefore* is inserted before the last word, or lastwordafter* is
> inserted after the last word.  You should not define both lastwordbefore* and
> lastwordafter*.
>
> Note that any situation that is not covered above will default to the
> firstletter* and lastletter* being used if they are defined.

My first reaction was that while this adds opcodes and therefore complexity, it
still isn't obvious to me whether it actually covers more cases than before or
not. (I'm only talking about emphasis now, for capitals it is obvious!) Perhaps
I should have a look at some concrete UEB examples before questioning, but
anyway.

I'm curious, are you keeping firstletter* and lastletter* solely to preserve
backwards-compatibility, or do they still have a real function in the new
design? In the former case, I vote for dropping them for the benefit of
simplicity. We don't have to worry about backwards-compatibility too much. It's
easy enough to write a conversion script to update existing tables to the new
syntax.

> Words and characters are determined by characters defined as letters.  If the
> emphasis markings do not start at the beginning of a word it is shifted to the
> beginning of the first word.  If the emphasis markings do not end at a word
> end, it is shifted to the end of the last word.
>
> Characters marked as capital are merged with other characters marked as
> capitals if the characters between them are defined as spaces.

I don't quite understand. Does this mean that A B C is treated as a single
uppercase word?

One thing that would be useful is to be able to define characters that "break"
an uppercase passage, and characters that don't. For example in Dutch, the
characters that are not breaking (apart from letters), are minus, plus,
ampersand, full stop, and apostrophe. How does this work in UEB and how are you
handling that?

> The other attributes do not do this.


Thanks,
Bert

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: