[freedict] Re: [urgent] call for testing

  • From: Piotr Bański <bansp@xxxxx>
  • To: freedict@xxxxxxxxxxxxx
  • Date: Sun, 3 Jan 2021 15:39:08 +0100

Hi all,

Tried to look this up, but it looks like I'm behind the freedict technology somewhat :-)

I now know how to download the dict version of deu-eng, and where the TEI version is located
https://download.freedict.org/generated/deu-eng/ ;)
but how do I keep it updated on my disk, please?

(Did try to RTFM, did get to the API page, but I'm still too thick to succeed, it feels. Thanks in advance!)

And I was too late to check the IPA in the tmp/ subdirectory, because it's gone now -- sorry about not being able to help out before the original deadline.

The rest of my reply follows below.

On 30/12/2020 09:12, Sebastian Humenda wrote:

Hi Karl

Karl Bartel schrieb am 28.12.2020, 15:55 +0100:
Looks great! I checked the following (starting from the end) without
finding any problems:

Two other thoughts I had when looking at the pronunciations:
* In deu-eng-pron.tei.xz, there are entries where additional information is
given in parentheses (e.g. "(Buch) Stelle"). The part in parens should not
be included in the pronunciation, IMO.

Mmh, how important is that? Also, how would you handle cases in which the
parenthesised expression is an infix, i.e. between words?

I don't have a ready example of a lemma containing a parenthesized 'infix' in my mind, could you please provide one?

A general reply to this kind of question is that
* an entry is identified by the lemma, which should ideally be unique across the dictionary (with e.g. superscripts to ensure the uniqueness for homographs)
* in the simplest case, the form-related information (spelling, pronunciation) provided inside the entry concerns the lemma itself, unless more forms need to be mentioned (as is the case of irregular verbs, plurals, etc.)
* parentheses usually indicate optional bits, and should ideally be kept outside of the identifier (for uniqueness), although various reasons and conventions may force them (like, economy of space, which is more characteristic of print dictionaries rather than electronic ones, unless the latter for some reason need to mirror the former)
* parentheses are more typical of examples of individual senses in polysemic words, to help tease the senses apart by providing a "collocate", that is a characteristic representative of a class of words that combine with the lemma in that particular sense

Probably naively, I would treat "Buchstelle" as a sort of related entry, given its specific sense. Or maybe (and here the limitations of my German show up) as a subsense, if "Buch" is only one of a class of possible modifiers for this very sense of "Stelle".

In the former case (a related entry), one can easily imagine dictionaries (especially of the electronic kind) that prefer to 'explode' related (sub)entries into full-fledged entries, but then, the lemma should not contain brackets (because "Buchstelle" simply identifies a new entry, and there is no optionality).

In the latter case, where "(Buch)stelle" identifies a subsense, I would not bother to provide any pronunciation information, because the main lemma, "Stelle" would receive that at the top of the entry. And if, for some reason, pronunciation should be provided at this level as well, my gut feeling would be that of Karl's, to only provide it, redundantly in the context of the entire entry, for "Stelle".

I admit that I am a bit lost when looking at the original form that Karl is asking about, namely "(Buch) Stelle" because it feels somewhat unreal, given my (limited) knowledge of German. (I would not expect them written separately, with separate capitalization).

Update: I have only been able to grep through the deu-eng.tei (my oXygen editor died on it :-)), but I wasn't able to find the "(Buch) Stelle" example, or anything similar. Was I too literal in interpreting the original question?

* It would make sense to add pronunciations to the translations to, for
those cases where I start in my native language.

I would implement it, but have no time to check whether the stylesheets allow
that, whether this breaks slob export, whether our DTD allows it and where to
place it (within cit or next to the quote?). If you do the research, please
open an issue documenting your findings. Thanks :).

The XML might allow that (<form> inside a <cit>), though I would fear mismatches between the suggested pronunciation and the actual equivalent ('object vs. ob'ject), although, if the grammatical information of the equivalent is properly exposed, then I may well be wrong to be pessimistic.




