Hi Bert,
I think I have a good example. Let us still use the string “txt” or rather
“.txt”. This is the mark of a file extension, and according to the Danish
rules, file names urls and email addresses etc. should be written in grade 1,
i.e. no contractions. However, it is still unclear, whether the x should be
preceded by letsign, since a human is supposed to be able to interpret it as a
file name and then know that the x should be read as “x” and not as “mm” or in
Danish “or”. In other words, we have something like a circular definition or
process.
Without the letsign, this example is completely ambiguous, because liblouis has
no way of knowing if you want to write “.txt” or “.tmmt” (or in Danish “.tort”).
Forward translation also has its odd corners, especially in many Germanic
languages where contraction always takes place within the boundaries of
syllables. In Danish, the word “vandret” can mean two things: “vand-ret” means
horizontal, and “van-dret” means walked or migrated. In the first case, you
may/should use the contraction for “nd” (the letter q), but in the second case,
this contraction is not allowed. Liblouis has no way of knowing which is the
correct word. I have heard of similar examples in other languages.
In this particular case, I chose to omit the “nd” contraction, reasoning that
it is better not to use a contraction that is allowed than to actually use one
that is not allowed.
Also dropped signs can be a problem, both when translating and
back-translating. Many of them has a meaning, both as a punctuation sign and as
a contraction, e.g. an 235. Usually, there are rules on how to use these signs,
so you don’t confuse contractions and punctuations when reading. However, it is
usually easy to come up with examples, which are obvious to the human mind but
ambiguous to the computer. Perhaps not so much in Danish as in English or
German.
Letters with accents is another example of completely ambiguity. The letter e
can have many accents, but if the accents are not a part of the given Braille
code, there is usually only one way to mark a “foreign” accent. So, when
back-translating, Liblouis cannot know which accent was used in the original
text.
Btw: This last case would not be caught by a back-translate/re-translate test
cycle. The incorrectly back-translated accent would still result in the same
Braille accent marker when re-translated.
Usually, I test back-translation with a translation/back-translation cycle and
then test the back-translated text against the original.
On the whole, the problem with Braille is the fact that it was never designed
to be a one-to-one representation of ink print, rather a practical system to
enable blind people to read and write. The rules were never made by
mathematicians to comply with strict logic, but by people who were at many
times willing to sacrifice clarity for brevity and logic for practical
usefulness. So, I don’t think we can ever reach perfection in translation and
back-translation, but we can strive for excellency.
That said, I think both we and Liblouis are doing a great job, especially in
the areas where focused work is being put in. Concerning Danish Braille, I am
particularly impressed by what the hyphenation algorithm has done for correct
contraction of compound words. This has been haunting all previous attempts at
Danish Braille translation and has usually led to light-year-long lists of
exceptions.
When the Danish tables have become somewhat more stable (and I don’t have to
fear for them being broken with every commit :-), I would like to have a look
at some proper tables for German back-translation, that is if no one more
qualified is already doing it. I already have quite good working knowledge of
German Braille, but, of course, I would need to brush up on the rules. Do you,
by chance, have any authoritative material on German braille in electronic
format?
Bue
Fra: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] På vegne af Bert Frees
Sendt: 20. januar 2017 11:44
Til: liblouis-liblouisxml@xxxxxxxxxxxxx
Emne: [liblouis-liblouisxml] Re: SV: Re: SV: Re: SV: 8 dots contracted with
caps was: are the swap opcodes broken?
OK thanks. Well, in this particular example it's pretty clear what the correct
back-translation is, right? And this case isn't that hard for a computer
program to solve. Do you also have examples where it is less clear, or even
completely ambiguous?
I imagine a lot of braille codes have cases even without capitals that pose
challenges on automatic back-translation. I have to admit I have no idea what
Liblouis does at the moment, and haven't thought about back-translation in
general much at all, so this could be a pointless or naive question, but I'll
ask anyway: wouldn't it be a good strategy to validate different
back-translation scenarios by forward-translating them again?
2017-01-19 23:37 GMT+01:00 Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx
<mailto:bue@xxxxxxxxxxxxxxxxxx> >:
Hi Bert,
Technical or computer unfriendly? Probably a bit of both, but not impossible, I
think.
I will try to give an example where back-translation might go wrong:
Take the string “TXT”. Never mind that it is also a computer term and should
probably therefore not be contracted in the first place.
If capsnocont is in effect, it will be translated as either ,,txt or ,t,x,t
depending on the status of capsword (plain TXT in 8 dots). So far, so good. No
contraction anyway.
Back-translating ,,txt you get TXT because the begcapsword tells liblouis to
not use contraction rules when back-translating.
However, back-translating ,t,x,t or TXT, you get TMmT, unless Liblouis knows
that it should use the capsnocont rule whenever it sees two consecutive caps,
or unless the x had a letsign in addition to the capslettersign.
The rules for letsigns in this context might be different from language to
language, hence the computer unfriendliness. The Danish rules are unclear on
this, but I think most people would use a letsign in a case like this one.
So, it is mainly a question of securing the correct back-translation, even if
there is no begcapsword sign to indicate clearly that contraction rules should
not be used here.
Hope it makes more sense now.
Bue
Fra: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
<mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx>
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx ;
<mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx> ] På vegne af Bert Frees
Sendt: 19. januar 2017 09:55
Til: liblouis-liblouisxml@xxxxxxxxxxxxx
<mailto:liblouis-liblouisxml@xxxxxxxxxxxxx>
Emne: [liblouis-liblouisxml] Re: SV: Re: SV: 8 dots contracted with caps was:
are the swap opcodes broken?
2017-01-18 21:17 GMT+01:00 Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx
<mailto:bue@xxxxxxxxxxxxxxxxxx> >:
Hi Bert,
Regarding your first "btw" I don't quite understand what the problem is.
Maybe you are overthinking it?
Regarding your second btw, yes perhaps you are right. But in which category
fall words that are not fully uppercase, but also not only the first letter?