The hyphenation algorithm in liblouis is a modified version of the onne used in OpenOffice. It should work with any ISO code. Try the lou_checkhyphens test tool.
Thanks.I tried lou_checkhyphens, but unfortunately the same problem occurred :(. Letters with unicode U+00A0 and higher are not always handled correctly. (I tried entering the words in both ISO 8859-2 and UTF-8. When I enter them in UTF-8, the length of the "hyphenation mask" doesn't even match the lenght of the input sometimes.)
If you put your test string through liblouisxml you may get different results, because liblouisxml has a hyphhenation routine that decides whether to use the liblouis routine based on the number of characters that overflow a line.
Yes, I am aware of that. I also know that back-translation must be performed first.
What are the rules for making hyphenation tables? I've been trying to find them for a long time.
The TeX hyphenation algorithm is explained at <http://en.wikipedia.org/wiki/TeX#Hyphenation_and_justification>. Basically, an odd number means letters can be split, an even number means letters cannot be split and higher numbers have higher precedence. OpenOffice.org (Hunspell) uses a modified implementation of the original TeX algorithm and therefore needs conversion of the standard hyphenation patterns, but that's not entirely clear to me. More info at <http://wiki.services.openoffice.org/wiki/Documentation/SL/Using_TeX_hyphenation_patterns_in_OpenOffice.org#1._Download_up-to-date_TeX_hyphenation_patterns>.
Bert
Thanks, John On Wed, Sep 08, 2010 at 01:54:00PM +0200, Bert Frees wrote:Hi listers, I've been experimenting a little with hyphenation tables because I want to understand them better, and there's not much about them in the documentation. I think liblouis has a problem with hyphenation tables that are not encoded in ISO8859-1. As an example, I've made a small translation table and hyphenation table. The hyphenation table is encoded in ISO8859-2 and has only one entry, which says that b and c should always be split. ****************** Translation table ************** space \x0020 0 (blank) uplow \x0042\x0062 12 (letter b) uplow \x0106\x0107 146 (letter c with acute) uplow \x00C6\x00E6 123456 (letter ae) *************************************************** ****************** Hyphenation table ************** ISO8859-2 b1c *************************************************** Then, if I try to transcribe a file with the string "bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc bbbccc ..." the words are not split. Strangly enough, when i transcribe the string "bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae bbbaeaeae ..." the words are split!! It is obvious that liblouis confuses the letters c (unicode U+0107 and E6 in ISO8859-2) and ae (which is unicode U+00E6). In the Polish hyphenation table (hyph_pl_PL.dic) I noticed the letter c is represented by "/c" (slash-c). But changing "b1c" into "b1/c" doesn't solve the problem either. Anybody got any idea of what the cause of this problem might be? Bert
For a description of the software and to download it go to http://www.jjb-software.com