[liblouis-liblouisxml] Re: Documentation

  • From: Davy Kager <DavyKager@xxxxxxxxxx>
  • To: "'liblouis-liblouisxml@xxxxxxxxxxxxx'" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Tue, 8 Sep 2015 12:36:54 +0000

@Davy: I have another request for you. Mike asked us for a good description
of how Dutch capitalisation works, so that he could give us his view on how
support for Dutch and UEB should best be combined. Could you help him with
that

Here goes. Keep in mind that as with most standards, you can probably find
exceptions to these rules.

The basics:
* There are two capital signs, one for single capitalized letters and one for
multiple capitalized letters.
* There is an end indicator that among other things denotes the end of a
capitalized section within a word.
* Entire words also get the multiple capitals sign.
* If a phrase has four all-capitals words or more, the first word gets two
multiple capitals signs as a prefix and the last word gets one multiple
capitals sign as a prefix.

The slightly more interesting stuff:
* In general Dutch is much like UEB in that any non-letter character ends a
capitalized section. See the examples below.
* Some characters, such as the hyphen and the period do not end a capitalized
section if they are surrounded by capitals.
* Trailing non-letter characters are ignored in the handling of capitals.
Trailing means that they are followed by a space or the end of the input.
* A word as used for the four-words-or-more rule is identified only using
spaces, not the hyphen, period, etc.
* Both types of capital sign end a number.
* A letter that is not in the range a-j also ends a number and thus doesn't
need an end character.
* A letter in the range a-j doesn't end a number and thus needs the end
indicator.
* The number indicator ends a capitalized section.
* The same behavior holds for the emphasis indicators. The only difference is
that the actual indicators are different.
* The end indicator ends everything, including both capitals and all types of
emphasis at the same time. This is a gray area in the standard. It can be hard
to determine which modes are active or if a mode needs to be restarted after an
end indicator. A consequence of this rule is that you never get two consecutive
end indicators.

Examples

The following are all four words:
hello how are you
Hello, how are you?
hello h0w 4r5 you
hello *** are you
hello --- *** you

The following are all one word with one capitalized section:
SHOPPING
SHOP-PING
SHOP.PING
SHOP-.-PING

The following are all one word with two capitalized sections:
SHOP/PING
SHOP@xxxxx

The following do not need an end indicator but some need a type of capital sign:
44w
44wow
44W
44WOW
44D
44DEE
In contrast, this does need an end indicator:
44d
44dee

I hope this makes sense. There are two things that aren't fully working yet:
1. The handling of emphasis, i.e. the emphmodechars opcode that is supposed to
work analogous to capsmodechars.
2. The handling of mixed numbers and letters, currently mostly using context
rules.

I haven't tested (1) with Michael's recent updates. It may be that this works
differently now. The capsmodechars opcode is doing great.

HTH,
Davy

-----Oorspronkelijk bericht-----
Van: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] Namens Bert Frees
Verzonden: maandag 7 september 2015 14:01
Aan: liblouis-liblouisxml@xxxxxxxxxxxxx
Onderwerp: [liblouis-liblouisxml] Re: Documentation

Thank you Mike!

And thanks also Davy. Christian is going to update the wiki page with your new
documentation.

@Davy: I have another request for you. Mike asked us for a good description of
how Dutch capitalisation works, so that he could give us his view on how
support for Dutch and UEB should best be combined. Could you help him with that?

@Mike: Have you also had a chance to have a second look at midnum, letsign,
letsignbefore, letsignafter, as you promised? As you know I worry a bit about
there going to be opcodes that do more or less the same thing which will lead
to ambiguity for the users and may also lead to maintenance problems. So if
possible I want to deprecate opcodes that aren't needed anymore. Or we could
also make clear in the documentation which alternative of two similar opcodes
should be used in which situation. It is really important the users can make
sense of the new additions.

@Davy: regarding the names not being generic enough: I wouldn't worry about it
too much, this will all be solved by issue #99 anyway. If #99 is not done
before the release, I guess we could also prefix transcode{1-5} with "ueb" as a
temporary solution.




Davy Kager writes:

Hi,

Included is an updated version of the documentation. I added the
‘capsmodechars’ that I implemented and also changed the markdown a bit
and fixed some typoes.

Regarding firstletter{emph} and {emph}word:

· Is firstletter{emph} intended to work as it does in the current
version
(2.6.4) without the UEB patches?

· Does it make sense to remove either firstletter{emph} or {emph}word?
It seems that one is a fallback for the other. They are not identical,
however, and the implementation details confuse me somewhat.

· Analogous for lastletter{emph} and {emph}wordstop.

I’m also wondering if the transnote{1-5} opcodes shouldn’t be given
more generic names so that it becomes more intuitive that they can be
used for any custom mark-up.

Van: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] Namens Michael Gray
Verzonden: vrijdag 4 september 2015 20:58
Aan: liblouis-liblouisxml@xxxxxxxxxxxxx
Onderwerp: [liblouis-liblouisxml] Documentation

Here is the current documentation that I wrote based of the documentation on
the wiki.
For a description of the software, to download it and links to project pages go
to http://liblouis.org
DISCLAIMER:
De informatie verzonden met dit e-mail bericht is uitsluitend bestemd voor de
geadresseerde. Indien u niet de beoogde geadresseerde bent, verzoeken wij u
vriendelijk dit aan de afzender te melden (of via:
info@xxxxxxxxxx<mailto:info@xxxxxxxxxx>) en het origineel en eventuele kopieën
te verwijderen.

The information sent in this e-mail is solely intended for the individual or
company to whom it is addressed. If you received this message in error, please
notify the sender immediately (or mail to
info@xxxxxxxxxx<mailto:info@xxxxxxxxxx>) and delete the original message and
possible copies.

��u��*m���~�^�����޶�h�yhiحjwe�y,��k�7����z�(��m����&��謢�

Other related posts: