[liblouis-liblouisxml] Re: ISO8859-2 encoded hyphenation tables

  • From: "John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Thu, 9 Sep 2010 08:35:21 -0500

This looks like a bug.

Thanks for the links to the hyphenation algorithm.

John

On Thu, Sep 09, 2010 at 11:45:13AM +0200, Bert Frees wrote:
> 
> >The hyphenation algorithm in liblouis is a modified version of the onne
> >used in OpenOffice. It should work with any ISO code. Try the
> >lou_checkhyphens test tool.
> 
> Thanks.
> I tried lou_checkhyphens, but unfortunately the same problem occurred 
> :(. Letters with unicode U+00A0 and higher are not always handled 
> correctly. (I tried entering the words in both ISO 8859-2 and UTF-8. 
> When I enter them in UTF-8, the length of the "hyphenation mask" doesn't 
> even match the lenght of the input sometimes.)
> 
> >If you put your test string through
> >liblouisxml you may get different results, because liblouisxml has a
> >hyphhenation routine that decides whether to use the liblouis routine
> >based on the number of characters that overflow a line.
> >   
> 
> Yes, I am aware of that. I also know that back-translation must be 
> performed first.
> 
> >What are the rules for making hyphenation tables? I've been trying to
> >find them for a long time.
> >   
> 
> The TeX hyphenation algorithm is explained at 
> <http://en.wikipedia.org/wiki/TeX#Hyphenation_and_justification>. 
> Basically, an odd number means letters can be split, an even number 
> means letters cannot be split and higher numbers have higher precedence. 
> OpenOffice.org (Hunspell) uses a modified implementation of the original 
> TeX algorithm and therefore needs conversion of the standard hyphenation 
> patterns, but that's not entirely clear to me. More info at 
> <http://wiki.services.openoffice.org/wiki/Documentation/SL/Using_TeX_hyphenation_patterns_in_OpenOffice.org#1._Download_up-to-date_TeX_hyphenation_patterns>.
> 
> 
> Bert
> 
> >Thanks,
> >John
> >
> >On Wed, Sep 08, 2010 at 01:54:00PM +0200, Bert Frees wrote:
> >   
> >>    Hi listers,
> >>
> >>    I've been experimenting a little with hyphenation tables because I 
> >>    want to
> >>    understand them better, and there's not much about them in the
> >>    documentation. I think liblouis has a problem with hyphenation tables 
> >>    that
> >>    are not encoded in ISO8859-1.
> >>
> >>    As an example, I've made a small translation table and hyphenation 
> >>    table.
> >>    The hyphenation table is encoded in ISO8859-2 and has only one entry,
> >>    which says that b and c should always be split.
> >>
> >>    ****************** Translation table **************
> >>    space \x0020       0      (blank)
> >>    uplow \x0042\x0062 12     (letter b)
> >>    uplow \x0106\x0107 146    (letter c with acute)
> >>    uplow \x00C6\x00E6 123456 (letter ae)
> >>    ***************************************************
> >>
> >>    ****************** Hyphenation table **************
> >>    ISO8859-2
> >>    b1c
> >>    ***************************************************
> >>
> >>    Then, if I try to transcribe a file with the string
> >>
> >>    "bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc ..."
> >>
> >>    the words are not split. Strangly enough, when i transcribe the string
> >>
> >>    "bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae
> >>    bbbaeaeae ..."
> >>
> >>    the words are split!! It is obvious that liblouis confuses the 
> >>    letters c
> >>    (unicode U+0107 and E6 in ISO8859-2) and ae (which is unicode 
> >>    U+00E6). In
> >>    the Polish hyphenation table (hyph_pl_PL.dic) I noticed the letter c 
> >>    is
> >>    represented by "/c" (slash-c). But changing "b1c" into "b1/c" doesn't
> >>    solve the problem either.
> >>
> >>    Anybody got any idea of what the cause of this problem might be?
> >>
> >>    Bert
> >>     
> 
> For a description of the software and to download it go to
> http://www.jjb-software.com

-- 
John J. Boyer; President, Chief Software Developer
Abilitiessoft, Inc.
http://www.abilitiessoft.com
Madison, Wisconsin USA
Developing software for people with disabilities

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: