Hi Einhard
Einhard Leichtfuß schrieb am 30.08.2020, 0:56 +0200:
I am allowed to publish under any license I wish; though the university
may also use it under some other license.
Note that I currently target version 1.8.1 exclusively.
1.8.1. of what?
Of the Ding.
A) TEI
A.1) TEI Lex-0. Have I understood correctly that it is a good idea to
follow this standard [0]? E.g.
* a) <gram type="gender"/> instead of <gen/>.;
* b) <usg> with @type (and possibly @norm)
I'm not sure about this, Michael, Piotr, do you have comments? If nothing
comes during the next 1-2 weeks, I would say rather stick to the current
version that is in our schemas. It is easy to transform and better if
consistent with other dictionaries.
By "our schemas", you mean the files
fd-dictionaries : shared/freedict-P5.* ?
I have to admit that these files are hard to grap for me (no prior
experience with XML). Are these meant to serve as human-readable
documentation? Is it worth the effort?
Otherwise, I will continue to rely on the Wiki, the (example) TEI files
and the TEI docs (and your answers).
I actually like the TEI Lex-0 standard, in particular:[…]
i) b) from above: a fixed listed of good @type's (see the
comparison table at [10]). How would I represent
@type="textType" (e.g. bibl., poet., admin., journalese) or
@type="attitude" (e.g. derog., euph.), which do not have an
equivalent in the TEI suggested @type's?
? Should I just use these as suggested in TEI Lex-0, thereby
creating a mixture between TEI and TEI Lex-0?
A.5) Quantified (or similar) usage annotations
* Ex.: "mainly Am."
* Ex.: "bes. Süddt.", "especially Am."
? How to represent the determiner?
What is the determiner here? I thought determiner are for componound phrases
such as lemmon tree.
"mainly", "bes.", "especially". I thought these were determiners.
[…]A.6) Dialect / language annotations.
a) Ex.: "[Br.]", "[Am.]", "[Ös.]", "[Sächs.]"
b) Ex.: "[South Africa]", "[Hessen]", "[Berlin]", "[Wien]"
d) Ex.: "[French]", "[Lat.]"
? Represent as <usg type="geographic">?
* According to TEI Lex-0: "marker which identifies the place or
region where a lexical unit is mainly used"
* Matches c) only.
? Separate d)? And represent how?
In any case, I see subtle differences and would suggest either to
be sloppy and group all these as a sort of geographic identifier (only
French/Lat. don't fit)
What to do with French/Lat. then?
A.7) Abbreviations.
a) Headwords, which are annotations.
* rare
b) Annotated on headwords.
? How to represent in TEI?
* The TEI documentation contains an example [7] with both
<form type="abbrev"> and <form type="full">, in the same
<entry>.
* I remember though that within the Freedict project multiple
<form> tags inside <entry> are frowned upon.
Just do:
<entry>
<form>
<orth>headword</orth>
<form type="abbrev">
<orth>h.w.</orth>
</form>
</form>
…
And "Headwords, which are *abbreviations*" (wrong word above) I would
represent as
entry/form[@type="abbrev"]/orth ?
A.8) entry/sense/gramGrp - OK?
? Or may these be combined?
A.9) Header
A.9.1) fileDesc/publicationStmt/license
In Freedict, I see only <availability> used (for licensing
information). Why not <license>?
A.9.2) Date
* The Ding is annotated with both a version and a date.
? How/whether to represent the date?
A.9.3) fileDesc/publicationStmt/pubPlace
* HowTo [11]: "http://freedict.org/";
* (example) TEI:
"<ref target="http://freedict.org/";>http://freedict.org/</ref>"
A.9.4) notesStmt/note[@type="status"]
* HowTo [11]: "documents the size of the database"
* existing eng-deu.tei: "old upstream version"
* Elsewhere: "Big enough to be useful", "stable"
? One of these two? Which?
A.9.5) revisionDesc/change[…]
A.9.6) fileDesc/editionStmt/edition (Version)
? Add minor version to account for changes caused by the
importer?
A.9.7) fileDesc/titleStmt/editor
* Should I consider myself an editor?
* TEI doc: "[...] acting as editor, compiler, translator, etc."
C.11) Reference types.
a) Ding: unit ~word
* often synonyms, not always
? xr[@type="see"] (seems correct to me)
b) Ding: unit1; unit2
* Units that translate to the same group of units.
* Considered bidirectional synonyms.
* Frequently differently gendered forms of the same word
* I think it's OK to consider these synonymous.
* Otherwise, identification of plural forms using a
heuristic should be doable.
? xr[@type="syn"] (seems correct to me)
? Should <ref/> have a content?
? Or only @target?
c) Ding: unit1 | unit2 // group1 | group2
* units that are somewhat related
? xr[@type="see"]
* no longer discernible from a).
? Some other/new @type value (for a) or c))?
Attachment:
signature.asc
Description: PGP signature