[freedict] Re: Self Introduction
- From: Sebastian Humenda <shumenda@xxxxxx>
- To: freedict@xxxxxxxxxxxxx
- Date: Sun, 20 Jun 2021 16:38:27 +0200
Tyler Nickerson schrieb am 25.05.2021, 15:44 -0700:
I’ve been a lurker on this mailing list for a while now so I thought I might
go ahead and introduce myself (as well as offer to help with the project). I’m
Tyler, a designer, developer, and language enthusiast currently based in the
My software has actually started relying on FreeDict pretty heavily recently,
as it uses your dictionaries to help language learners better comprehend
vocabulary words during live conversation.
That's pretty cool. That's the kind of application beyond just dictionaries
that should be doable with our dictionaries. I just had a look, French is not
yet supported :).
As a result, I felt compelled to reach out and see if I could help out in any
way. I have an extensive academic background in computer science and a good
deal of UI/web design experience. I’m also fairly proficient in Mandarin as
Oh certainly. We are in many aspects needing help and I hope my late response
did not scare you off.
I compiled a quick list of things that I'd like to do if time would permit:
1. On the dictionary side, we have a long list of dictionaries to look at. In
particular, the Chinese dictionaries seemed like a low hanging fruit to me:
- add chinese-hungarian dictionary #27
- add chinese-english dictionary #26
- add chinese-german dictionary #25
2. Another important issue is the refreshing of the schemas. We are still
relying on a somewhat old dialect of TEI P5. Piotr opened an issue but I
would imagine a helping hand would be appreciated:
refreshing the schemas: freeze the p5subset, add it to our vc, update the
syntax in the ODD schema #62
3. Discuss and implement a new conversion strategy
In short, we're having our XSL style sheets that support conversion into
the plain text format for the Dict server. The target format is outdated
and the style sheets are so slow that they begin to be beyond usefulness.
Our Slob exports is done bei tei2slob, a tool that understands a different
subset of what is defined in the schemas. A rewrite should work on a more
uniform level. Therefore, one could bring PyGlossary up to date with our
version of the FreeDict TEI P5. As Karl pointed out, this could be a
difficult task because PyGlossary is not built for semantic markup and
hence it seems it doesn't have a powerful intermediate representation that
would suit our needs. So before starting this task, a good deal of
research on requirements and existing code would be required.
I'm not sure which of the task could potentially be of interest. From the
priority side, you can read this list backwards.
I was actually curious - I know CEDICT and ECDICT are two very popular Chinese
<> English dictionaries, and was wondering if their licensing would allow
FreeDict to offer Chinese dictionaries based on them.
What is the actual licence? There is CC-CEDICT with CC-BY-SA-4.0 that would be
a fit for FreeDict. How does the licence tcompare to the dictionary mentioned
And one final note – I’ve also developed a fully open-source dictionary file
format, that, unlike a lot of others, isn’t based on underlying HTML, as an
open spec, compiles to binary from an XML markup, and features a
case-insensitive entry lookup baked into its API. I’d love to help FreeDict
officially offer dictionaries in this format, as I’ve already written a repo
that converts the TEI source files into .odict binaries.
That would be a great fit for point 3. of my list. We particularly like TEI
due to its recognition in the linguist community and due to its semantic
markup. It is a good pivot format. It is much easier to convert from semantic
markup to non-semantic markup than vice versa. The dictionary format of yours
is hence interesting indeed.
Anyway, just wanted to drop in and say hi to you all! Let me know if I can be
of assistance :)
Thanks, please let us resume the discussion!
Description: PGP signature
Other related posts: