[freedict] Re: Self Introduction

From: Sebastian Humenda <shumenda@xxxxxx>
To: freedict@xxxxxxxxxxxxx
Date: Tue, 29 Jun 2021 22:27:11 +0200

Hi Tyler

Tyler Nickerson schrieb am 20.06.2021, 18:56 -0700:

I definitely think I could help take on the Chinese dictionaries (at the very
least, CEDICT), seeing I’ve written both a tool that converts from FreeDict to
ODICT and CEDICT to ODICT. I can’t imagine adjusting the latter to port to TEI
instead would be too hard. In fact, if I could be added to the FreeDict org I
could fork the repo and open source the conversion script.

Can you please send me here or off-list your GitHub name? Then you don't need
to fork but can directly contribute.

As for the work surrounding schemas, I’m actually pretty unfamiliar with the
TEI standards, so I think more context around the goals trying to be achieved
here would be needed.

Sure, it's not a good starting issue to work on.

As for point #3, I think there is an opportunity for ODict to be advantageous
here. Similar to TEI, it’s layout-agnostic and semantic, but can be compiled
down to a binary for easier storage and access. Plus, the CLI that it comes
with can perform instant ad-hoc entry lookups right in the terminal. That
said, I realize how well adopted and robust the TEI format is, and I hope to
eventually extend ODict to offer the same level of granularity TEI has. Right
now ODict is very dictionary-centric, whereas I feel TEI is designed for
multiple kinds of lexical data (dictionaries, glossaries, books, etc.).

Though we use the dictionary subset of it (chapter 9). I don't think ODict is
a competitor here. TEI really should be the pivot format, especially due to
its high variability of ways to encode things.
If I had time and resources, I would parse TEI into a pivot representation and
derive any output format from there. PyGlossary does pretty much this, but on
a much more primitive level and without any semantic notation. For those who
know Pandoc: something like Pandoc's intermediate AST would be a great thing
to have.

Unrelated, but something I think would make for a cool addition to the
FreeDict site, and was a project I was hoping to eventually some time in as an
off-shoot of ODict, is a way to visually lookup and add terms to FreeDict
dictionaries. Something similar to Wiktionary, with all data being stored as
structured, semantic markup (unlike Wiktionary’s actual dumps) and fully
downloadable. It could help to automatically increase the robustness of
FreeDict’s dictionaries and could become a definitive lexical resource for
language learners and educators. It’s part of the reason that ODict has a
“merge” utility in its CLI (I figured people could enter new data, and have it
automatically be merged with the existing dictionary binary and become
available for download).

Yeah, there have been several attempts to do exactly this. One is actually
within our organisation:
https://github.com/freedict/LCOD

As I suppose you will start with the Chinese dictionaries first, we can start
with a discussion on how to proceed from there.

Cheers
Sebastian
--
FreeDict - Free And Open Dictionaries
Manage your subscription at https://www.freelists.org/list/freedict
Wiki: https://github.com/freedict/fd-dictionaries/wiki
Web: http://freedict.org

References:
- [freedict] Re: Self Introduction
  - From: Sebastian Humenda
- [freedict] Re: Self Introduction
  - From: Tyler Nickerson

[freedict] Re: Self Introduction

Other related posts: