Hi Bert, Bue and All,
Sillable boundaries are also very important in the hyphenation of Hungarian
words, most exceptions however occur in conjunction with boundaries of compound
words. From the aspect of maintaining a hyphenation dictionary, compound words
make frequent updating necessary because they constitute a highly productive
area of the language and no programmed logic can identify such in-word
boundaries.
However, what Attila ment by non-standard hyphenation has more to do with
situations where the word's spelling changes due to the hyphenation. This is
also known in German but Hungarian is really abound with such instances,
therefore if an automatic hyphenation method cannot handle such cases, it
greatly impacts the efficiency of the hyphenation. Here is an example:
Without hyphenation: karosszéria
With hyphenation: ka-rosz-szé-ria
So, in the version without hyphenation, there is a sequence of ssz (that is the
indication of a long voiceless szibilant), while in the hyphenated version, the
long sibilant is cut into two short such s sounds written as sz-sz.
TeX seems to have a device to handle such changes of spelling using forward
slashes and commas, etc.
And just another question inspired by this topic:
In TeX for the English Language, parameters are set to prevent hyphenation from
occuring after the first letter of a word and between the last three letters
(indicated as 2/3), for Hungarian, it is set to 2/2. For Hungarian braille, the
convention is that hyphenation may occur after the first letter of a word if
that letter is a vowel, and it is only prevented from occuring between the last
two letters of the word, thus it could be indicated as 1/2.
Is there a way in liblouis to control this?
Best Regards, Norbert.
From: Bert Frees
Sent: Friday, May 12, 2017 2:36 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: SV: Re: Nonstandard hyphenation rules:
Liblouis not support this type rules when doing hyphenation?
Hi Bue,
Thanks for sharing.
I should however make clear, for Attila, in order to avoid any confusion, that
this has absolutely nothing to do with non-standard hyphenation. (In fact you
can argue it has more to do with contraction that with hyphenation.) In other
words you can't use this to solve your problem.
Still, it's a very interesting approach. It has inspired me to do something
similar and I hope it will inspire others on the list. I believe we have
already discussed it extensively but some people might not be familiar with it
yet. Like Bue said, especially for other Germanic languages it might be
interesting.
2017-05-12 13:44 GMT+02:00 Bue Vester-Andersen <bue@xxxxxxxxxxxxxxxxxx>:
Hi Bert and Attila,
FYI, here is what I do, so far successfully:
I collected a corpus of about 650,000 words with the words occurring most
frequently at the top of the list. Then I hyphenated about 20,000 words
manually, actually proof-reading hyphenation with the existing hyphenation
table. Next, I made a new hyphenation table using the good old patgen tool.
From then on, my repeating procedure has been as follows:
1. Translate the whole corpus of Danish words with Liblouis.
2. Collect contraction errors with current hyphenation table over time.
3. Correct words and insert them in my list of hyphenated words.
4. Build new hyphenation table, translate with Liblouis, and compare
result with before.
5. Hyphenate all new words with different result, correcting the failed
ones, and insert them in the hyphenation list.
6. Go back to step 4, and re-build the hyphenation table etc. until
there are no new differences.
In other words, I only correct words where the hyphenation affects the
correct translation, not all words with hyphenation errors.
In the beginning, correcting a few words would usually make a whole wave of
differences, resulting in hundreds of new words being inserted in the list at
the end of the day. Now, however, changes to the list are in deed very rare,
and I can easily say that, concerning hyphenation, I have the best contraction
system in Danish ever, and much better than with the traditional approach with
exception lists.
It sure does require a lot of work, but I think it is an approach that could
also be used in other languages, especially other Germanic languages where you
cannot contract across syllable boundaries.
Bue
Fra: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] På vegne af Bert Frees
Sendt: 12. maj 2017 11:50
Til: liblouis-liblouisxml@xxxxxxxxxxxxx
Emne: [liblouis-liblouisxml] Re: Nonstandard hyphenation rules: Liblouis not
support this type rules when doing hyphenation?
Hi Attila,
No, Liblouis does not support non-standard hyphenation rules. Liblouis is
based on Libhnj, the predecessor or Hyphen, the library that you mention.
Libhnj did not support non-standard hyphenation.
Liblouis basically includes a complete modified version of Libhnj. What I
would like to see in the future is that Liblouis may be configured to use
Hyphen as an external library. I haven't really explored yet whether this is
possible to do, but I hope so. This would be a first step towards supporting
non-standard hyphenation. I'm afraid however that the priority of this change
is pretty low. If anyone feels compelled to have a look at it, don't hesitate.
The way I perform hyphenation is by computing the break points of the braille
word based on the break points of the untranslated word (which I get from
hyphenating with a hyphenation library such as Hyphen) by using Liblouis'
inputPos argument (see
http://liblouis.org/documentation/liblouis.html#lou_005ftranslate). In other
words, hyphenation and braille translation are two separate steps. I use more
or less the same approach for non-standard hyphenation, although it's a bit
more advanced. If you are interested I could go into a bit more detail.
Bert
2017-05-12 9:05 GMT+02:00 Hammer Attila <hammera@xxxxxxxxx>:
Hi List,
With hungarian language in hungarian braille some situations the three
letter consonants need different way hyphenate.
Few example words:
asszony: hyphenated word is asz-szony
Asszonnyal: hyphenated word is asz-szony-nyal
Meggyes: right hyphenated word is megy-gyes
The hyphen-2.8.8 source package the README.nonstandard document describing
following part the non standard hyphenation rules related:
"Non-standard hyphenation
------------------------
Some languages use non-standard hyphenation; `discretionary'
character changes at hyphenation points. For example,
Catalan: paral·lel -> paral-lel,
Dutch: omaatje -> oma-tje,
German (before the new orthography): Schiffahrt -> Schiff-fahrt,
Hungarian: asszonnyal -> asz-szony-nyal (multiple occurance!)
Swedish: tillata -> till-lata.
Using this extended library, you can define
non-standard hyphenation patterns. For example:"
With asszonnyal word related I tryed following rules in hyph_hu_HU.dic,
without successfully result:
First variation:
".as3szon/sz=,2,1
n1nyal./ny=,1,1"
If I launch lou_checkhyphens utility, in table list type
hu-hu-g1.ctb,hyph_hu_HU.dic, press r letter and type asszonnyal word, the
asszonnyal word not hyphenated.
Liblouis hyphenation feature doesn't support this type rules?
I attaching entire README.nonstandard document.
Unfortunately hungarian language very often need this type way hyphenation.
Future possible extending Liblouis hyphenate function to support this type
nonstandard hyphenation rules too?
Attila