[liblouis-liblouisxml] Hyphenation

  • From: "John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Thu, 31 May 2012 22:42:10 -0500

Bert Frees and I have discussed this off-list to some extent. I think it 
is now time to submit some ideas to the wider community. Hyphenation is 
important in languages with many long words.

One idea is to pre-hyphenate text before sending it to liblouis. This 
would be done by having liblouisutdml run a hyphenation algorithm, 
probably a version of the one now used in liblouis. This would generate 
an array similar to the present liblouis typeform parameter. This would 
be a char array and a bit would be set where a hyphen was permissible. 
This array would be passed to a new liblouis function called 
lou_translatePrehyphenated along with the usual parameters.

However, it seems to me that prehyphenating every word is a lot of 
unnecessary processing. Hyphenation is needed only at the end of lines 
and perhaps in a few other situations. Since the inputPos array 
generated by liblouis is now accurate, it can be used to find the print 
word corresponding to the last word that does not fit on a line. This 
word could then be passed to lou_hyphenate for hyphenation and 
retranslation with hyphenation indicators as this function now does. 
However, the back-translation step in lou_hyphenate would be eliminated, 
since it is unreliable. After lou_hyphenate returns liblouisutdml would 
chose the most suitable hyphenation point.

Thanks,
John

-- 
John J. Boyer; President, Chief Software Developer
Abilitiessoft, Inc.
http://www.abilitiessoft.com
Madison, Wisconsin USA
Developing software for people with disabilities

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: