[liblouis-liblouisxml] Hyphenation

  • From: "John J. Boyer" <johnjboyer@xxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Mon, 23 Mar 2009 02:56:59 -0500

By private e-mail Lars sent me an example of incorrect hyphenation. The
hyphenation algorithm is complicated. First, the word at the end of the
line is checked for length. Words of less than 5 characters are not
candidates for hyphenation. The word in the sample was much longer than
this. Next, the word is back-translated so the hyphenation algorithm,
which was derived from OpenOffice can be applied to it. The result is
stripped of leading and trailing punctuation and submitted to the
algorithm. This produces a string of digits, with odd digits indicating
where hyphenation may occur. The word is then forward translated, with
position-tracking, so that positions in the translated word can be
correlated with the positions where hyphenation may occur. The position
nearest to the end of the line is chosen. 

Hyphenation has always been rather uncertain. Since position-tracking 
has been tweaked since the hyphenation algorithm was written, it is 
probably time to revisit it. Work on math codes is taking priority at 
the moment.

I wonder if someone can find the program which produces the hyphenation 
tables that we use. I did find the original paper describing it (by 
Wang, I think), but it was a pdf and consisted mostly of page images. 
Only very incomplete OCR had been done.

Thanks,
John

-- 
My websites:
GodTouches Digital Ministry, Inc. http://www.abilitiessoft.com/godtouches
Abilitiessoft, Inc. http://www.abilitiessoft.com
Location: Madison, WI, USA

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: