[freedict] Re: [urgent] call for testing
- From: Piotr Bański <bansp@xxxxx>
- To: freedict@xxxxxxxxxxxxx
- Date: Sun, 3 Jan 2021 15:39:08 +0100
Tried to look this up, but it looks like I'm behind the freedict
technology somewhat :-)
I now know how to download the dict version of deu-eng, and where the
TEI version is located
( https://download.freedict.org/generated/deu-eng/ ;)
but how do I keep it updated on my disk, please?
(Did try to RTFM, did get to the API page, but I'm still too thick to
succeed, it feels. Thanks in advance!)
And I was too late to check the IPA in the tmp/ subdirectory, because
it's gone now -- sorry about not being able to help out before the
The rest of my reply follows below.
On 30/12/2020 09:12, Sebastian Humenda wrote:
Karl Bartel schrieb am 28.12.2020, 15:55 +0100:
Looks great! I checked the following (starting from the end) without
finding any problems:
Two other thoughts I had when looking at the pronunciations:
* In deu-eng-pron.tei.xz, there are entries where additional information is
given in parentheses (e.g. "(Buch) Stelle"). The part in parens should not
be included in the pronunciation, IMO.
Mmh, how important is that? Also, how would you handle cases in which the
parenthesised expression is an infix, i.e. between words?
I don't have a ready example of a lemma containing a parenthesized
'infix' in my mind, could you please provide one?
A general reply to this kind of question is that
* an entry is identified by the lemma, which should ideally be unique
across the dictionary (with e.g. superscripts to ensure the uniqueness
* in the simplest case, the form-related information (spelling,
pronunciation) provided inside the entry concerns the lemma itself,
unless more forms need to be mentioned (as is the case of irregular
verbs, plurals, etc.)
* parentheses usually indicate optional bits, and should ideally be kept
outside of the identifier (for uniqueness), although various reasons and
conventions may force them (like, economy of space, which is more
characteristic of print dictionaries rather than electronic ones, unless
the latter for some reason need to mirror the former)
* parentheses are more typical of examples of individual senses in
polysemic words, to help tease the senses apart by providing a
"collocate", that is a characteristic representative of a class of words
that combine with the lemma in that particular sense
Probably naively, I would treat "Buchstelle" as a sort of related entry,
given its specific sense. Or maybe (and here the limitations of my
German show up) as a subsense, if "Buch" is only one of a class of
possible modifiers for this very sense of "Stelle".
In the former case (a related entry), one can easily imagine
dictionaries (especially of the electronic kind) that prefer to
'explode' related (sub)entries into full-fledged entries, but then, the
lemma should not contain brackets (because "Buchstelle" simply
identifies a new entry, and there is no optionality).
In the latter case, where "(Buch)stelle" identifies a subsense, I would
not bother to provide any pronunciation information, because the main
lemma, "Stelle" would receive that at the top of the entry. And if, for
some reason, pronunciation should be provided at this level as well, my
gut feeling would be that of Karl's, to only provide it, redundantly in
the context of the entire entry, for "Stelle".
I admit that I am a bit lost when looking at the original form that Karl
is asking about, namely "(Buch) Stelle" because it feels somewhat
unreal, given my (limited) knowledge of German. (I would not expect them
written separately, with separate capitalization).
Update: I have only been able to grep through the deu-eng.tei (my oXygen
editor died on it :-)), but I wasn't able to find the "(Buch) Stelle"
example, or anything similar. Was I too literal in interpreting the
* It would make sense to add pronunciations to the translations to, for
those cases where I start in my native language.
I would implement it, but have no time to check whether the stylesheets allow
that, whether this breaks slob export, whether our DTD allows it and where to
place it (within cit or next to the quote?). If you do the research, please
open an issue documenting your findings. Thanks :).
The XML might allow that (<form> inside a <cit>), though I would fear
mismatches between the suggested pronunciation and the actual equivalent
('object vs. ob'ject), although, if the grammatical information of the
equivalent is properly exposed, then I may well be wrong to be pessimistic.
FreeDict - Free And Open Dictionaries
Manage your subscription at https://www.freelists.org/list/freedict
Other related posts: