Hi Tyler
Tyler Nickerson schrieb am 20.06.2021, 18:56 -0700:
I definitely think I could help take on the Chinese dictionaries (at the very
least, CEDICT), seeing I’ve written both a tool that converts from FreeDict to
ODICT and CEDICT to ODICT. I can’t imagine adjusting the latter to port to TEI
instead would be too hard. In fact, if I could be added to the FreeDict org I
could fork the repo and open source the conversion script.
As for the work surrounding schemas, I’m actually pretty unfamiliar with the
TEI standards, so I think more context around the goals trying to be achieved
here would be needed.
As for point #3, I think there is an opportunity for ODict to be advantageous
here. Similar to TEI, it’s layout-agnostic and semantic, but can be compiled
down to a binary for easier storage and access. Plus, the CLI that it comes
with can perform instant ad-hoc entry lookups right in the terminal. That
said, I realize how well adopted and robust the TEI format is, and I hope to
eventually extend ODict to offer the same level of granularity TEI has. Right
now ODict is very dictionary-centric, whereas I feel TEI is designed for
multiple kinds of lexical data (dictionaries, glossaries, books, etc.).
Unrelated, but something I think would make for a cool addition to the
FreeDict site, and was a project I was hoping to eventually some time in as an
off-shoot of ODict, is a way to visually lookup and add terms to FreeDict
dictionaries. Something similar to Wiktionary, with all data being stored as
structured, semantic markup (unlike Wiktionary’s actual dumps) and fully
downloadable. It could help to automatically increase the robustness of
FreeDict’s dictionaries and could become a definitive lexical resource for
language learners and educators. It’s part of the reason that ODict has a
“merge” utility in its CLI (I figured people could enter new data, and have it
automatically be merged with the existing dictionary binary and become
available for download).