[freedict] Re: Case inflections, verb and adjective forms

  • From: Piotr Bański <bansp@xxxxx>
  • To: freedict@xxxxxxxxxxxxx
  • Date: Tue, 8 Sep 2020 15:10:28 +0200

Hi Karl,

The advantages of what you suggest would be relatively obvious. A minor disadvantage would be that Lex0 was created to handle "retrodigitized dictionaries", as a kind of pivot format for which the various OCR tools and human encoders could aim, and from which it would be straightforward to continue the processing. That is why Lex0 forbids e.g. <pos> in favour of <gram type="partOfSpeech"> -- to keep everything as generic as possible.

What I think would be ideal for Freedict is a customization of another (but related) standard, namely ISO LMF-4. However, the publicly available documentation for that standard is not there yet, and I am not allowed to distribute the version published by ISO (paid). This standard is a product of ISO-TEI liaison and as such, it should in time be fully reflected in the relevant chapter of the TEI Guidelines, but so far, it isn't and I am not sure when that is going to happen. For now, only a skeletal example document has been published:

https://github.com/DARIAH-ERIC/lexicalresources/blob/master/Schemas/LMFinTEI%20Specification/examplesLMFinTEI.xml

So, for practical purposes, LMF is not the most straightforward path to take.

-------

Now, concerning the possible practical solution of least effort: what we could do is accept the various general descriptive solutions offered by Lex0, while still treating it as a pivot/baseline for the FreeDict format, that is, without accepting all of the verbose genericity offered there.

In case Lex0-specific tools appear that we would like to use, we could have a script to translate the existing FreeDict format to Lex0 (so, for example, turning <pos> and <gen> and friends into their generic typed <gram> equivalents, and manipulating the <form>s and <sense>s a bit).

If some of you guys are thinking of preparing tools that would handle more than FreeDict, then Lex0 could be the target, and then that might entail the necessity of having a mapping script, potentially both ways. OR, you could have a settings file that would handle the FreeDict - Lex0 differences locally to the tool (which is actually what other projects might then appreciate, too).

The important thing to bear in mind is that Lex0 is not meant to be the final project format; it is rather meant to define a baseline for various TEI-based project formats.

Best,

  Piotr


On 08/09/2020 09:33, Karl Bartel wrote:

    https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html


That's a great document! I must have missed it earlier, so thanks for reposting!

Should we treat this as an official FreeDict recommendation for new dictionaries and link it from the documentation page?

Karl

--
FreeDict - Free And Open Dictionaries
Manage your subscription at https://www.freelists.org/list/freedict
Wiki: https://github.com/freedict/fd-dictionaries/wiki
Web: http://freedict.org

Other related posts: