[liblouis-liblouisxml] Re: Back translation issues

  • From: Ken Perry <kperry@xxxxxxx>
  • To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Wed, 25 Jun 2014 11:55:11 +0000

Just for everyone's info my stupid script was adding blank lines so the number 
of back translation problems are cut in half.  This still leaves 8600+ in 
en-us-g2.ctb and 10,000+ in the UEB tables.  But I wanted to point out it was 
not quite as bad as it could be.  I know before I took a break I had the 
en-us-g2 down to 800 words with a  patch to the tables.  I never put it in 
because a few were not happy with how I solved some of the apostrophe problems. 
 In truth most of the apostrophe problems need to be fixed in code not tables.  
I think there is a problem with the way the translator is splitting words with 
apostrophe's but I will prove that before I make a long table patch again.

ken

From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx 
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Ken Perry
Sent: Tuesday, June 24, 2014 10:36 AM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Back translation issues


Ok I have just ran my script to see how things are going with UEB.  While I was 
at it I thought I might just go ahead and see how the US-g2 tables are doing 
for back translation.  This is not an issue if your reading only in braille but 
if  we are using it to read braille file out loud, or if we are using it to 
back translate for a screen reader there are still a whole lot of problems.  I 
am attaching my words list, and two files called broke.us and broke.ueb.  Each 
of the files have 3 columns.  The first one is the word.  The second is the 
back translation and the third is the forward translation before I back 
translated it.  These are all words that do not back translate correctly.

Currently  it looks like the en-us-g2 table does a bit better with only 17390 
errors while the en-ueb-g2 table has 21666  errors.  But I would think this 
would do a lot better.

There are other problems like dollar sign back translation in sentences that 
these files don't catch but I wanted to get the words list out there first.  I 
will start pushing patches to the UEB and maybe to the US English one if I can 
do that on the git repo.  I know we are tracking the UEB ones on the git repo 
but not sure about the us g2.

I figured I would send this as kind of a bug report because no one seems to be 
working on back translation issues.

Ken

Other related posts: