[liblouis-liblouisxml] My fix way is good the hungarian backtranslate failures related error?

  • From: Hammer Attila <hammera@xxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Sat, 28 Jul 2012 15:08:31 +0200

Hy,

I two days ago begin working a possible fix way with partialy solves backtranslation related problems with hu-hu-g1.ctb table. I attaching the fix patch, but please don't commit yet. I would like known only this way possible fixing this type problem a table related, or I need changing the fix way? Very interesting some opcodes possible easyest or more safe way defining?
Now, I using lot of exactdots and correct opcodes.
Actual fix results with already working:
1. Hu-hu-g1_harness.txt test harness file backtranslated correct, only have 1 backtranslation failure. Prewious have 552 backtranslate failures. 2. An about 300000 testcases containing temporary test harness file only producing 395 backtranslation failures. This failures happening because lot of words containing szsz or ssz letters, and some time backtranslated text not always ssz letters need containing, some words the 156-156 dot combination backtranslated letters need replacing with szsz letters.

Known issues:
1. If a text containing for example a number, a minus character and a, b, c, d, e, f,g, h, i, or j letters, the second part text is backtranslated with normal numbers, not letters. Look an example: If a source text containing for example the 9-ei hungarian text, the forward translated braille output is #i-ei. If this braille text are backtranslated, the resulted text is 9-59. This is not correct, because second text after the minus character is not literary digits, normal letters, braille output right not containing numsign indicator after - character before the ei text. If I real write 9-59 text the input text, this text braille output is #i-#ei, this situation backtranslation result is full correct, 9-59 text. I don't understand I why get this wrong result the first example, because I using the midnum - 36-3456 opcode. I tryed this test with en-us-g1.ctb table, result is reproducable with lou_translate command.
Type lou_translate en-us-g1.ctb command, and write 9-ei and 9-59 texts.
Remember the braille results, and test this braille backtranslation results with lou_translate -b en-us-g1.ctb command.

2. The [ and ] simbols is backtranslating now with ü and ű characters if between this characters have any text. I don't no this problem how can possible resolving, because other exactdots @12356 and exactdots @23456 results need replacing with ü and ű characters.

So, my fix is not full ready, but hopefuly relative short time possible finalizing. I very need your hints to how can possible doing safest fix with producing equals good results this problem related.

Attila
Index: tables/hu-backtranslate_corrections.cti
===================================================================
--- tables/hu-backtranslate_corrections.cti     (revision 0)
+++ tables/hu-backtranslate_corrections.cti     (revision 0)
@@ -0,0 +1,120 @@
+#Following part correct some backtranslate issues
+#first step exactdots lines replace some Liblouis backtranslate resulted wrong 
braille characters with dot combination
+exactdots @46-16
+exactdots @16
+exactdots @46-12456
+exactdots @12456
+exactdots @46-12345
+exactdots @12345
+exactdots @46-23456
+exactdots @23456
+exactdots @46-12356
+exactdots @12356
+exactdots @46-34
+exactdots @34
+exactdots @46-4
+exactdots @4
+exactdots @46-126
+exactdots @126
+exactdots @46-346
+exactdots @346
+exactdots @46-246
+exactdots @246
+exactdots @2346
+exactdots @1356
+exactdots @45
+exactdots @46-12346
+exactdots @12346
+
+#processing exceptions with only affecting backtranslation, because normal 
forward translation this exceptions good
+partword állkapocs 4-123-123-13-1-1234-135-146
+partword bilincs 12-24-123-24-1345-146
+partword ananász 1-1345-1-1345-4-156
+partword bajusz 12-1-245-136-156
+partword boszorkány 12-135-156-135-1235-13-4-1246
+partword kalász 13-1-123-4-156
+partword cipész 14-24-1234-16-156
+partword bányászszt 12-4-1246-4-156-156-2345
+partword bányászszer 12-4-1246-4-156-156-15-1235
+begword dísz 145-34-156
+partword furdancs 124-136-1235-145-1-1345-146
+partword gyümölcs 1456-12356-134-12345-123-146
+always gyümölccsé 1456-12356-134-12345-123-146-146-16
+partword kalapács 13-1-123-1-1234-4-146
+always kaviccsá 13-1-1236-24-146-146-4
+partword korbács 13-135-1235-12-4-146
+always korbáccsá 13-135-1235-12-4-146-146-4
+begword kulcs 13-136-123-146
+partword felejcs 124-15-123-15-245-146
+partword narancs 1345-1-1235-1-1345-146
+partword mancs 134-1-1345-146
+partword szárkapocs 156-4-1235-13-1-1234-135-146
+partword papucs 1234-1-1234-136-146
+always árkász 4-1235-13-4-156
+begword mész 134-16-156
+always mésszé 134-16-156-156-16
+begword juhász 245-136-125-4-156
+always juhásszá 245-136-125-4-156-156-4
+begword penész 1234-15-1345-16-156
+always penésszé 1234-15-1345-16-156-156-16
+begword régész 1235-16-1245-16-156
+begword szakasz 156-1-13-1-156
+begword lövész 123-12345-1236-16-156
+begword utász 136-2345-4-156
+begword kopasz 13-135-1234-1-156
+begword zenész 126-15-1345-16-156
+begword vadász 1236-1-145-4-156
+
+
+#second last step correct exactdots opcode resulted dot combinations with real 
characters
+correct "@46-16" "Ã?"
+correct "@16" "é"
+correct "@46-4" "Ã?"
+correct "@4" "á"
+correct "@46-12345" "Ã?"
+correct "@12345" "ö"
+correct "@46-12456" "Å?"
+correct "@12456" "Å?"
+correct "@46-12356" "Ã?"
+correct "@12356" "ü"
+correct "@46-23456" "Å°"
+correct "@23456" "ű"
+correct "@46-34" "Ã?"
+correct "@34" "í"
+correct "@46-346" "Ã?"
+correct "@346" "ú"
+correct "@46-126" "Z"
+correct "@126" "z"
+correct "@46-246" "Ã?"
+correct "@246" "ó"
+correct "@1356" ")"
+correct "@2346" "("
+correct "@456" "ly"
+correct "@45" "@"
+correct "@46-12346" "Q"
+correct "@12346" "q"
+correct "·" "áé"
+correct "�" "ff"
+correct "�" "fi"
+correct "�" "fl"
+correct "�" "ffi"
+correct "�" "ffl"
+correct "Å£" "tc"
+correct "Å?" "sc"
+correct "Å?" "r"
+correct "à" "á"
+correct "ñ" "n"
+correct "cscsal" "ccsal"
+correct "cscsel" "ccsel"
+correct "szsza"~ "ssza"
+correct "szsze"~ "ssze"
+correct "zszsa" "zzsa"
+correct "zszsal" "zzsal"
+correct "zszse" "zzse"
+correct "nynya" "nnya"
+correct "zszs" "zzs"
+correct "â??" "--"
+correct "szakaszsza" "szakassza"
+correct "szakaszszon" "szakasszon"
+correct "szakaszszuk" "szakasszuk"
+
Index: tables/hu-hu-g1.ctb
===================================================================
--- tables/hu-hu-g1.ctb (revision 754)
+++ tables/hu-hu-g1.ctb (working copy)
@@ -30,6 +30,7 @@
 include hu-chardefs.cti
 include hu-exceptionwords.cti
 include braille-patterns.cti
+include hu-backtranslate_corrections.cti
 
 #Braille indicators
 numsign 3456
@@ -138,5 +139,7 @@
 always ^ 2346
 always ` 4
 always Ã? 1
+always lysz 456-156
+always lyú 456-346
 undefined 26
 

Other related posts:

  • » [liblouis-liblouisxml] My fix way is good the hungarian backtranslate failures related error? - Hammer Attila