[liblouis-liblouisxml] Re: Backtranslation and use of classes

  • From: "John J. Boyer" <johnjboyer@xxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Wed, 18 Mar 2009 20:13:54 -0500

Jonathan,

In the en-us-g2.ctb table, if you search for the word back you will see 
some of the fixes used to get good back-translation. I'm not sure how 
much similar fixes would be useful with other languages and braille 
codes.

Some of the problems you noticed can be remedied easily. For example 
a.m. and p.m. can be entered as "words". A "word" is simply a string of 
characters with blanks and/or punctuation before and after. The string 
itself can contain anything. Some of the "words" in the French table are 
actually phrases containing spaces. The strings should be entered in 
lower-case. Upper case will be handled automatically. Of course, a more 
generalized fix is needed to handle abbreviations. I'll think about 
that.

Much the same could be said for dates and fractions. If the original 
document contained slashes, that is what we should get in 
back-translation. The "st" and slash are bad actors.

One thing to note is that the simplest coding may not result in good 
back-translation. One problem I encountered was the word "friend". If 
this is coded as "always friend 124-1235" then words like frisky will 
back-translate as "friendisky." I had to put the various compounds of 
"friend" in as separate entries.

John

On Thu, Mar 19, 2009 at 12:06:31PM +1300, Jonathan Sharp wrote:
> Hi John,
> 
> The en-us-g2.ctb back-translation works very well so it would be great to
> have some documentation so that we can fix some problems in some of the
> other translators.  There are a few things I've noticed in which
> en-us-g2.ctb should be easily fixed sometime:
> 
> 1. Dates such as 03/19/09. This forward translates as #jc/ai/ji, but on
> back-translation the slashes turn into st, giving: 03st19st09. Actually I
> looked up dates in the BANA Braille Codes Update 2007 and in the update for
> 2.7e (Page L22) they say 4-5-6 3-4 should be used for slashes in dates. 
> 
> 2. Simple fractions such as 1/2 have the same problem as dates.  Here the
> 2007 update says that when the numerator is above the denominator, 3-4
> should be used on its own.  If they are on the same level then 4-5-6 3-4
> should be used.
>  
> 3. A problem caused by ambiguities in braille but for which an exception
> could be used is a.m. and p.m. as used in 12-hour times such as 2:30 a.m. Of
> course they can be in upper case as A.M. and P.M too.  Forward translation
> of a.m. correctly gives a4m4 and p.m. gives p4m4 but, of course,
> back-translation results in addm. and pddm. What would be the best way to
> add exceptions for these that would work for both the lower and upper case
> scenarios?
> 
> 4. The pound sterling symbol is not translated correctly in either
> direction. In braille it should appear as dots 1-2-3 in front of the number
> sign. It is currently appearing as an extra number sign.
> 
> 5. The euro sign is not recognised.
> 
> 6. The cents sign is forward-translated correctly but back-translates as
> c-cedilla.
> 
> Anyway, that's all I've found so far.
> 
> Jonathan
> 
> -----Original Message-----
> From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
> [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of John J.
> Boyer
> Sent: Thursday, 19 March 2009 4:17 a.m.
> To: liblouis-liblouisxml@xxxxxxxxxxxxx
> Subject: [liblouis-liblouisxml] Re: Backtranslation and use of classes
> 
> Michel,
> 
> Thanks for the information and example. The change to the code to handle 
> back-translation when classes are used may be quite simple. 
> Back-translation is actually simpler and faster than forward 
> translation. The tables do have to be tweaked to get good results. I'm 
> trying to remember what I did for the en-us-g2.ctb table, So I can write 
> some documentation.
> 
> John
> 
> On Wed, Mar 18, 2009 at 11:03:42AM +0100, Michel Such wrote:
> > Hi John,
> > 
> > As Jonathan pointed out, there seems to be a problem with backtranslation 
> > and tables that use classes.
> > 
> > Let's take an example:
> > 
> > In the table fr-bfu-g2.ctb, there is a class called con that contains all 
> > consonnants.
> > 
> > To contract the sequence of letters "er" we have 2 rules:
> > before con midword er 236
> > endword 236
> > 
> > This works fine in forward translation.
> > When backtranslating the "er" sequence placed inside a word before a 
> > consonnant is not reversed to "er".
> > 
> > If you replace the 2 rules by this one:
> > midendword er 236
> > it is not perfect in forward translation, which is normal, but works fine 
> > in any case in backtranslation.
> > So the problem really seems bo be with classes.
> > I imagine that backtranslation from grade 2 is a complexe job.
> > 
> > Michel Such 
> > 
> > 
> > For a description of the software and to download it go to
> > http://www.jjb-software.com
> 
> -- 
> My websites:
> GodTouches Digital Ministry, Inc. http://www.abilitiessoft.com/godtouches
> Abilitiessoft, Inc. http://www.abilitiessoft.com
> Location: Madison, WI, USA
> 
> For a description of the software and to download it go to
> http://www.jjb-software.com
>  
> 
> __________ Information from ESET NOD32 Antivirus, version of virus signature
> database 3946 (20090318) __________
> 
> The message was checked by ESET NOD32 Antivirus.
> 
> http://www.eset.com
>  
>  
> 
> __________ Information from ESET NOD32 Antivirus, version of virus signature
> database 3946 (20090318) __________
> 
> The message was checked by ESET NOD32 Antivirus.
> 
> http://www.eset.com
>  
> 
> For a description of the software and to download it go to
> http://www.jjb-software.com

-- 
My websites:
GodTouches Digital Ministry, Inc. http://www.abilitiessoft.com/godtouches
Abilitiessoft, Inc. http://www.abilitiessoft.com
Location: Madison, WI, USA

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: