Hello,I am making more progress, I can't fault hyphenation when I do it with the original text (IE. mode=0). I had been making a silly mistake which I spotted from trying to make some more sense from the transcriber.c file in liblouisxml, I had been checking for a numerical value 1 and 0, rather than char '1' and '0'.
When I use hyphenation with a translated string (IE. mode=1) the hyphens array contains other values other than '1' and '0'. It seems like sometimes it might be correct (when checking only for value '1') but I am uncertain. I can commit to mercurial some work to show the values in hyphens if it would be useful (or if you want I can just catch output and post it here).
Michael Whapples On 08/06/09 16:37, John J. Boyer wrote:
Michael, You have some good points. The hyphens string returned by lou_hyphenate should contain only 0's and 1's. It is a good idea to return a string of all 0s if the word cannot be hyphenated. You have discovered a bug. Thanks for the suggestion. I'll let you know when I have made the fixes. John On Sun, Jun 07, 2009 at 12:35:26PM +0100, Michael Whapples wrote:Hello, I have made some progress now,I can get something which seems like correct behaviour out of lou_hyphenate. One thing which slightly caught me out is that the docs say a 1 is at the beginning of a syllable and 0 else where, so I was getting my code to check for 1s, however printing out the values from hyphens reveals it to contain other values to 0 and 1 (eg. 48). If I assume any non-zero value instead of 1 I think this makes sense. Is this correct? Also I have noticed that certain characters can cause lou_hyphenate to return 0 (IE. fail hyphenation), such a string is "adder", but if that sequence is part of a larger word such as "ladder" lou_hyphenate works fine. So does lou_hyphenate returning 0 mean more than error (IE. no hyphenation possible)? I would expect if the word cannot be hyphenated then hyphens should contain just zeros and lou_hyphenate to return 1 (success) as the function didn't hit an error its just the word can't be hyphenated as shown in the hyphens content. Michael Whapples On 07/06/09 04:36, John J. Boyer wrote:Your inferences from the liblouisxml code are correct. You definitely must have a hyphenation table. It is placed after the translation table name, separated by a comma. For example, en-us-g2.ctb,hyph_en_US.dic The en-GB-g2.ctb table should work with this hyphenation table as well. John On Sat, Jun 06, 2009 at 11:28:07PM +0100, Michael Whapples wrote:Not being a C person I haven't given the source code of liblouisxml great attention. However I did have a quick look at the very specific part of the code you pointed to and this is what I gathered: * liblouisxml seems to split the text into words before passing it to the lou_hyphenate function. * Liblouisxml deals with some of the hyphenation itself (eg. if a hyphen is already in the word). * the rest which I could gather was already known from the liblouis documentation. So going with the first point of single words I tried passing in just one word, but still get lou_hyphenate returning 0. I don't seem to get any log messages produced from liblouis. Do you have a minimal example for using lou_hyphenate which I could examine? Ideallyh one where it is easy to see what the parameters are which are being passed into lou_hyphenate. Is there anyway I can get details of why liblouis is returning 0? I still wonder about the table I am using, should en-us-g2.ctb work? I was unable to gather this from looking at the liblouisxml source. Michael Whapples On 06/06/09 17:06, John J. Boyer wrote:The lou_hyphenate function is tricky, as is hyphenation in general. For an example of its use look at the hyphenate function in the liblouisxml module transcriber.c. John On Sat, Jun 06, 2009 at 04:26:43PM +0100, Michael Whapples wrote:Hello, I have tried to add support for the lou_hyphenate function into my java bindings, but I seem to only get the value 0 returned (IE. its failing to complete). Unfortunately I don't know why it fails to complete. I am using the en-us-g2.ctb translation table as I understand that the en-GB-g2.ctb table isn't so well developed. I also tried passing in the following string for translation table to see if specifying a hyphenation dictionary would help "en-us-g2.ctb,hyph_en_US.dic" but still no success. I guess first thing to check is if I am using a suitable table. If not what would be a correct value for trantab? Also for those java developers what would be your preferred return type, I plan to have it return a byte array with values as given by lou_hyphenate in the hyphens parameter. An alternative I can think of is to return a int array with each value being the index of a 1 value in the hyphens parameter of lou_hyphenate (IE. by iterating over the return value you would get each index of the beginning of a syllable, which could be used on the string you passed into the method). Michael Whapples For a description of the software and to download it go to http://www.jjb-software.comFor a description of the software and to download it go to http://www.jjb-software.comFor a description of the software and to download it go to http://www.jjb-software.com
For a description of the software and to download it go to http://www.jjb-software.com