[liblouis-liblouisxml] Re: Use of the lou_hyphenate function

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Mon, 08 Jun 2009 18:00:38 +0100

Hello,
I am making more progress, I can't fault hyphenation when I do it with the original text (IE. mode=0). I had been making a silly mistake which I spotted from trying to make some more sense from the transcriber.c file in liblouisxml, I had been checking for a numerical value 1 and 0, rather than char '1' and '0'.

When I use hyphenation with a translated string (IE. mode=1) the hyphens array contains other values other than '1' and '0'. It seems like sometimes it might be correct (when checking only for value '1') but I am uncertain. I can commit to mercurial some work to show the values in hyphens if it would be useful (or if you want I can just catch output and post it here).

Michael Whapples
On 08/06/09 16:37, John J. Boyer wrote:
Michael,

You have some good points. The hyphens string returned by lou_hyphenate
should contain only 0's and 1's. It is a good idea to return a string of
all 0s if the word cannot be hyphenated. You have discovered a bug.
Thanks for the suggestion. I'll let you know when I have made the fixes.

John

On Sun, Jun 07, 2009 at 12:35:26PM +0100, Michael Whapples wrote:
Hello,
I have made some progress now,I can get something which seems like
correct behaviour out of lou_hyphenate. One thing which slightly caught
me out is that the docs say a 1 is at the beginning of a syllable and 0
else where, so I was getting my code to check for 1s, however printing
out the values from hyphens reveals it to contain other values to 0 and
1 (eg. 48). If I assume any non-zero value instead of 1 I think this
makes sense. Is this correct?

Also I have noticed that certain characters can cause lou_hyphenate to
return 0 (IE. fail hyphenation), such a string is "adder", but if that
sequence is part of a larger word such as "ladder" lou_hyphenate works
fine. So does lou_hyphenate returning 0 mean more than error (IE. no
hyphenation possible)? I would expect if the word cannot be hyphenated
then hyphens should contain just zeros and lou_hyphenate to return 1
(success) as the function didn't hit an error its just the word can't be
hyphenated as shown in the hyphens content.

Michael Whapples
On 07/06/09 04:36, John J. Boyer wrote:
Your inferences from the liblouisxml code are correct. You definitely
must have a hyphenation table. It is placed after the translation table
name, separated by a comma. For example, en-us-g2.ctb,hyph_en_US.dic

The en-GB-g2.ctb table should work with this hyphenation table as well.

John

On Sat, Jun 06, 2009 at 11:28:07PM +0100, Michael Whapples wrote:

Not being a C person I haven't given the source code of liblouisxml
great attention. However I did have a quick look at the very specific
part of the code you pointed to and this is what I gathered:

* liblouisxml seems to split the text into words before passing it to
the lou_hyphenate function.
* Liblouisxml deals with some of the hyphenation itself (eg. if a hyphen
is already in the word).
* the rest which I could gather was already known from the liblouis
documentation.

So going with the first point of single words I tried passing in just
one word, but still get lou_hyphenate returning 0. I don't seem to get
any log messages produced from liblouis.

Do you have a minimal example for using lou_hyphenate which I could
examine? Ideallyh one where it is easy to see what the parameters are
which are being passed into lou_hyphenate.

Is there anyway I can get details of why liblouis is returning 0?

I still wonder about the table I am using, should en-us-g2.ctb work? I
was unable to gather this from looking at the liblouisxml source.

Michael Whapples
On 06/06/09 17:06, John J. Boyer wrote:

The lou_hyphenate function is tricky, as is hyphenation in general. For
an example of its use look at the hyphenate function in the liblouisxml
module transcriber.c.

John

On Sat, Jun 06, 2009 at 04:26:43PM +0100, Michael Whapples wrote:


Hello,
I have tried to add support for the lou_hyphenate function into my java
bindings, but I seem to only get the value 0 returned (IE. its failing
to complete). Unfortunately I don't know why it fails to complete. I am
using the en-us-g2.ctb translation table as I understand that the
en-GB-g2.ctb table isn't so well developed. I also tried passing in the
following string for translation table to see if specifying a
hyphenation dictionary would help "en-us-g2.ctb,hyph_en_US.dic" but
still no success.

I guess first thing to check is if I am using a suitable table. If not
what would be a correct value for trantab?

Also for those java developers what would be your preferred return type,
I plan to have it return a byte array with values as given by
lou_hyphenate in the hyphens parameter. An alternative I can think of is
to return a int array with each value being the index of a 1 value in
the hyphens parameter of lou_hyphenate (IE. by iterating over the return
value you would get each index of the beginning of a syllable, which
could be used on the string you passed into the method).

Michael Whapples
For a description of the software and to download it go to
http://www.jjb-software.com



For a description of the software and to download it go to
http://www.jjb-software.com


For a description of the software and to download it go to
http://www.jjb-software.com

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: