[freedict] Re: TEI file with multiple orth and gramGrp

  • From: Piotr Bański <bansp@xxxxx>
  • To: freedict@xxxxxxxxxxxxx, Jochen Peters <jochen.peters@xxxxxxxxxxxxxxxxxx>
  • Date: Sun, 9 Dec 2018 21:35:11 +0100

Dear Jochen,

This looks like a fascinating use case. I won't be trying to reply with anything concrete until at least the 13th, but in the meantime, may I ask what sort of project this is coming from? Is this a digitization project or are you putting something new together out of electronic sources, or from your expertise?

I tried to follow the metadata you are citing in the header, but the web address doesn't seem to have any link to project description (or I missed that).

The reason I'm asking is that if this is a digitization issue then I might have an avenue that could benefit both Freedict and a meta-digitization initiative called TEI Lex0.

Otherwise, it can still serve as a valuable use case for TEI description, and also for some Freedict-internal decisions on how to visualize the data.

Ok, so I want to stay silent until the 13th, but let me quickly confirm first: the fundamental issue is that you have a single form (with cool variants, but let's leave that for now) with alternative grammatical descriptions (it is syncretic between 3 sets of grammatical features), and a fairly simple set of equivalents, with, crucially, all the senses applying to all the listed grammatical variants. The primary source of the problem seems to be coming from a morphologically rich language to a morphologically impoverished language.

My general reaction is to congratulate you on the instinct -- this looks like a very sensible encoding. A small but important challenge for Freedict lies in the rendering this into DICT or other protocols, I feel. And it feels doable and useful for the project, because this sort of challenges are going to crop up.

So say (oh I can't shut up..) have you / are you willing to give this a go vis-a-vis Freedict XSL stylesheets and potentially suggest some backwards-compatible enrichment to them? I guess that in DICT, which is the most primitive of our display protocols, we are looking at displaying

{ forms } { gram.descs } { senses }

in a single stream, and I think much of that is in already. There could be some glitches based on the attribute values, but that doesn't sound like a big challenge. You probably don't want to display the numbers for each set of grammatical features, just to group them between semicolons or so. Right?

Thanks for turning to us with this!



On 12/9/18 7:58 PM, Jochen Peters wrote:

Hi freedict team,

I got in trouble: I have a csv file like this:

wordA written in transliterated type 1,
wordA written in transliterated type 2,
wordA without transliteration,
"other language meaning I, meaning II; complete different other meaning"

an other line may have the same wordA, but different transliteration and 
gramatical usages:

wordA written in transliterated type 1,
wordA written in transliterated type 2 but a bit different,
wordA without transliteration but a bit different,
"other language meaning I, meaning II; complete different other meaning"

I try to add the different gramatical usages this way in the TEI file,
but I am not sure, if this is the correct way:

<gramGrp n="1">..
<gramGrp n="2">..
<gramGrp n="3">..

I will add the different "original" words this way in each <entry> :

<orth type="standard" xml:lang="yi">HEWBREW CHARS</orth>
<orth type="transliterated" xml:lang="yih-Latn">avanturistish</orth>
<orth type="transliterated" xml:lang="ydd-Latn">avanturistishn</orth>

... but I am not sure, if this is the correct way, too.

Is this "mapping" ok? Did anyone have solve a similar problem and
solved it with success (or in a different way)?

I will be happy to get some ideas or answers. I add VERY small TEI example
as attachment.

kind regards,

