This looks like a bug. Thanks for the links to the hyphenation algorithm. John On Thu, Sep 09, 2010 at 11:45:13AM +0200, Bert Frees wrote: > > >The hyphenation algorithm in liblouis is a modified version of the onne > >used in OpenOffice. It should work with any ISO code. Try the > >lou_checkhyphens test tool. > > Thanks. > I tried lou_checkhyphens, but unfortunately the same problem occurred > :(. Letters with unicode U+00A0 and higher are not always handled > correctly. (I tried entering the words in both ISO 8859-2 and UTF-8. > When I enter them in UTF-8, the length of the "hyphenation mask" doesn't > even match the lenght of the input sometimes.) > > >If you put your test string through > >liblouisxml you may get different results, because liblouisxml has a > >hyphhenation routine that decides whether to use the liblouis routine > >based on the number of characters that overflow a line. > > > > Yes, I am aware of that. I also know that back-translation must be > performed first. > > >What are the rules for making hyphenation tables? I've been trying to > >find them for a long time. > > > > The TeX hyphenation algorithm is explained at > <http://en.wikipedia.org/wiki/TeX#Hyphenation_and_justification>. > Basically, an odd number means letters can be split, an even number > means letters cannot be split and higher numbers have higher precedence. > OpenOffice.org (Hunspell) uses a modified implementation of the original > TeX algorithm and therefore needs conversion of the standard hyphenation > patterns, but that's not entirely clear to me. More info at > <http://wiki.services.openoffice.org/wiki/Documentation/SL/Using_TeX_hyphenation_patterns_in_OpenOffice.org#1._Download_up-to-date_TeX_hyphenation_patterns>. > > > Bert > > >Thanks, > >John > > > >On Wed, Sep 08, 2010 at 01:54:00PM +0200, Bert Frees wrote: > > > >> Hi listers, > >> > >> I've been experimenting a little with hyphenation tables because I > >> want to > >> understand them better, and there's not much about them in the > >> documentation. I think liblouis has a problem with hyphenation tables > >> that > >> are not encoded in ISO8859-1. > >> > >> As an example, I've made a small translation table and hyphenation > >> table. > >> The hyphenation table is encoded in ISO8859-2 and has only one entry, > >> which says that b and c should always be split. > >> > >> ****************** Translation table ************** > >> space \x0020 0 (blank) > >> uplow \x0042\x0062 12 (letter b) > >> uplow \x0106\x0107 146 (letter c with acute) > >> uplow \x00C6\x00E6 123456 (letter ae) > >> *************************************************** > >> > >> ****************** Hyphenation table ************** > >> ISO8859-2 > >> b1c > >> *************************************************** > >> > >> Then, if I try to transcribe a file with the string > >> > >> "bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc ..." > >> > >> the words are not split. Strangly enough, when i transcribe the string > >> > >> "bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae > >> bbbaeaeae ..." > >> > >> the words are split!! It is obvious that liblouis confuses the > >> letters c > >> (unicode U+0107 and E6 in ISO8859-2) and ae (which is unicode > >> U+00E6). In > >> the Polish hyphenation table (hyph_pl_PL.dic) I noticed the letter c > >> is > >> represented by "/c" (slash-c). But changing "b1c" into "b1/c" doesn't > >> solve the problem either. > >> > >> Anybody got any idea of what the cause of this problem might be? > >> > >> Bert > >> > > For a description of the software and to download it go to > http://www.jjb-software.com -- John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc. http://www.abilitiessoft.com Madison, Wisconsin USA Developing software for people with disabilities For a description of the software and to download it go to http://www.jjb-software.com