Hi Einhard
Sorry for the late reply. I hope that your bachelor thesis has worked out well
in the meantime! I am merging a few of your e-mails.
If some of the comments below are irrelevant due to their discussion in the
wiki, just ignore. I'll get to the wiki page later.
Einhard Leichtfuß schrieb am 14.09.2020, 15:06 +0200:
will do! But not today :).
C.12) Grouping of homographs
* In brief: Is superEntry ok?
Piotr was already sceptical enough about superEntry. Our stylesheets at least
do not handle superEntry and also ignore hom.
Could you also give your opinion on whether I should ever use several
<sense>'s?
Note that I cannot reliably differentiate homographs with little to no
semantical relation (e.g., character: Zeichen / Rolle) from semantically
"similar" homographs (e.g., character: Buchstabe / Zeichen).
To look at the extremes, there are currently
* 48 entries for "offen",
* 26 entries for "turning".
What was your problem with linking that you mentioned in your other e-mail?
You can attach an id attribute to each orth or to a complete entry and can
use
this to refer to it using ref. Or did you mean something else?
Note that this is not really important, but:
When I parse ~tilde references, I translate them to xr/ref tags. The
content of xr is the plain text word, but I cannot add a @target, since
there likely are several <entry>'s that it could/does refer to.
I am currently maintaining a sed script that fixes a lot of syntax
errors. This is supposed to go upstream.
A.5) Quantified (or similar) usage annotations
* Ex.: "mainly Am."
* Ex.: "bes. Süddt.", "especially Am."
? How to represent the determiner?
What is the determiner here? I thought determiner are for componound
phrases
such as lemmon tree.
"mainly", "bes.", "especially". I thought these were determiners.
Sorry, I missed the point. I was unsure about determina and read up the
Wikipedia article, but apparently the wrong one. There is no encoding for
this
ATM, I think. What is the Lex-0 suggestion? :) Isn't this anyway part of
the usage? I
probably would have picked `<usg type="hint">mainly am.</usg>`, but maybe
that's too vague.
TEI Lex-0 suggests to use an attribute, but not which (there is a TODO
in the docs). None of the <usg> annotations really fit IMO, maybe @subtype?
Can't the usg types be freely chosen?
The types, yes. Not the attributes though.
What I would ideally like to have is something like:
<usg type="geo" freq="mainly">Am.</usg>
Note though that I this is not a high priority currently.
There is spa-deu which was imported with the previous version of the Ding[unimportant/short]
importer, but it is much, much simpler. It would be good if your importer
could also handle this to some extend; I'm not asking for an extremely
generic
solution.
Not a priority right now, possibly later. Also, would we want to use
spa-deu in FreeDict?
Unfortunately, some dictionary-specific processing cannot be (easily)
avoided - this is because there is a notable amount of syntax errors
(related to slashes) in the source that are hard/impossible to recover
automatically.
[possibly important]C.1) Inline annotations. In the Ding, some annotations are inside a
larger expression / phrase (as delimited by <;>. <|>, <::>) and
only refer to part of it.
? Can/should such annotations be represented in TEI?
* I am primarily concerned with grammar and usage annotations
here (potentially also flected forms - unsure, whether such
occur annotated inline).
* Note that suffixing annotations do not necessarily apply to the
whole expression, but I guess I should assume that.
* Alternatively, if I were to distinguish simple entries (e.g.,
single words) from example expressions/phrases, I could assume
/ require that simple expressions only have suffixing
annotations applying to it as a whole, while phrase annotations
might be entirely dropped.
* Example phrases need a headword they belong to. According to
the above definition, this is not always the case:
* Ex.: "Alternativkontur {f} der Schultern (Reifen) :: bead \
I feel that this paragraph contains too many questions at once. It would be
good if you could bring examples and questions closer together.
I do now recognize examples.
These retain all () literally, but no [] and {}.
The definition of examples is likely to change but for now it is
* (>= 3 words OR contains interpunction) AND
* there is a non-example (preceding in the line) that is an infix of the
example
In non-examples all infixing [],{}-annotations are dropped. () is
literally retained when infix.
Ex. "Brot {n} (des Jahres) vom Bäcker (Bäckerei) [cook.]"
-> "Brot (des Jahres) vom Bäcker", note:"Bäckerei", usg:"cook."
C.2) "<>". This separator indicates that the surrounding to entities
(usually single words) may be swapped.
* Potential representations
a) Leave as is, as part of the string.
b) Duplicate the element.
c) Explicit representation (possible?).
* Note: "<>" does not only occur in phrases.
* Ex.: "to file away <> sth."
I didn't get the c.2)'s description. Does this allow "to file away" and "to
file away smth"?
"to file away sth." and "to file sth. away"
[…]C.5) Usage [literary].
* One occurrence: "sword; blade [literary]"
* Denotes a <usg type="textType"> (see TEI Lex-0 spec).
* Not equivalent to [lit.], which denotes a usage domain.
* Similar to [poet.]
C.7) Grouped annotations.
* Ex: "[formal/Am.]", "{
* Grouped []-annotations seem to always represent a disjunction
(regardless of the separator: </>, <;>, <,>)
? Treat differently to separate annotations?
* Ex.: "[formal] [Am.]".
Yes, it's just a simplification for typing. Just parse them individually and
the same goes for encoding.
[slightly important]
// See 18) in the Wiki-List [0]
I am unsure whether we speak of the same thing.
I wanted to ask whether to differentiate
"[formal/Am.]" and "[formal] [Am.]"
(and how, if applicable).
Of course, I could just literally keep the </>:
<usg>formal/Am.<usg>
I am not a friend of this though. Also, it does not allow (in this
case) to set a specific @type.
The best would be something like
<choice>
<usg>formal</usg>
<usg>Am.</usg>
</choice>
C.8) Multiple genders
* Ex.: "Anwesende {m,f}"
* In fact, I would not consider "Anwesende" a base masculine form
(only when used with a definite article).
* Ex.: "Avis {m,n}"
? Simply two gender annotations in a single gramGrp?
Yes, I would say so. BTW, it's not your part to fix wrong grammar info in the
parser ;).
This would not differentiate this (a disjunction) from a conjunction.
C.9) [.]
a) usage annotation
* Ex.: "Sagenit {m} [min.]"
* identifiable by contained keywords
* If relying on that one might miss new keywords in later
versions of the Ding.
What about an explicit ocnfiguration that misses it. I lost track of what can
be contained in square brackets, so cannot recommend one way or the other. A
definite allow list makes sense, as long as it is not baked into the source
code, but shipped with a separate configuration.
[somewhat important]
Currently, there is a Haskell file that (essentially) only contains
lists of the form.
regionalUsages = ["Am.", "Br.", ...]
Sure, I could move things into separate plain text files, but I am
unsure of the advantages.
At least, I should probably move the mentioned file under Config/.
Attachment:
signature.asc
Description: PGP signature