[liblouis-liblouisxml] Re: Could anyone clarify the meaning of decpoint and midnum?

  • From: "Susan Jolly" <easjolly@xxxxxxxxxxxxx>
  • To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Mon, 7 Dec 2015 16:02:38 -0700

The question Davy is referring to is a problem for both print and braille and causes internationalization issues. Here is some background to remember.

According to Wikipedia the term "decimal mark" is the descriptor for the character used to separate the whole and fractional portions of a decimal number. This descriptor is not a Unicode character name. The Unicode character used for a decimal mark is different in different locales. In the United States the decimal mark is represented by the Full Stop Unicode Character. In the UK the decimal mark is represented by the Comma Unicode Character. UEB allows for both these possibilities with both being referred to as decimal signs. In UEB a decimal sign must be within the scope of a UEB Numeric Indicator.

Strings of digits often have embedded full stops as, for example, in the Liblouis version identifier v2.6.5. These full stops are not intended as decimal marks. However, UEB rules allow a full stop to be within the scope of a Numeric Indicator with the exception per Rule 6.4 that if a leading full stop is obviously a period and not a decimal point it not be prefaced by a Numeric Indicator.
Distinguishing the semantics in this situation is not always easy. That is why the preferred print style avoids a leading decimal point by using a zero before the decimal point.

So in both the US and the UK v2.6.5 would be translated to UEB braille as v#b4f4e ASCII braille. Thus in the US both the print and braille readers can only distinguish a period from a decimal point by context. The supposed advantage to the UEB formulation is that the translation can be done automatically as long as the print source is appropriate for the locale for which the braille translation is being produced.

Current American English braille uses dots-256 to translate a full stop intended as a period punctuation mark and a dots-46 within the scope of a number sign to translate a full stop intended as a decimal point. This has the disadvantage that the transcriber or transcription software must determine the local semantics of the print full stop character but has the advantage that the braille output has sufficient information to be backtranslated automatically to either US or UK print.

I recommend that liblouis at least use the heuristics that if an input print item consists of a sequence of digits with more than one embedded full stop that these full stops be treated as having the semantics of period punctuation marks in all locales. There should also be a method for recognising user markup intended to distinguish the semantics for cases where the built-in heuristics are inadequate.

Note that currently something similar needs to be done to distinguish the semantics of the Unicode characters which can represent either an apostrophe or a closing single quotation mark as the correct UEB translation depends on the semantics.

HTH,
Susan Jolly
For a description of the software, to download it and links to
project pages go to http://liblouis.org

Other related posts: