CALL FOR PAPERS – GLOBALEX 2020 – Linked Lexicography
Full-day workshop at LREC2020 | Marseille, France | May 12, 2020
Submission deadline: February 14, 2020 (see also Important dates below)
Workshop website: https://globalex.link/events/workshops/globalex-workshop-2020/
WORKSHOP DESCRIPTION
The GLOBALEX 2020 Workshop @ LREC will follow up on the successful GLOBALEX
workshops at LREC 2016 (https://globalex2016.globalex.link/) and LREC 2018
(https://globalex2018.globalex.link/). It is organized by Globalex – Global
Alliance for Lexicography
(https://globalex.link/<https://globalex.link/globalex2018/>), with support
from:
● ELEXIS (EU's H2020-funded project European Lexicography Infrastructure,
https://elex.is/)
● TIAD (Translation Inference Across Dictionaries shared tasks and
workshops, https://tiad2020.unizar.es/<https://tiad2012.unizar.es/>,
https://tiad2019.unizar.es/, https://tiad2017.wordpress.com/)
This third iteration of GLOBALEX workshops at LREC will focus on linking data
from lexicographic resources and will highlight aspects related to the
automated linking of content among different dictionaries and other lexical
sources, in the aim of enhancing linguistic data generation, enrichment and
reinforcement.
Linking lexicographic data sets to each other and to other lexical resources,
and in particular the interoperability of lexicography with Linked Data (LD)
methodologies, have been gaining substantial attention in recent years,
becoming a subject of various projects for research by and collaboration
between academia and industry, including support of the public sector. Most
notably, the W3C community group on Ontology-Lexica [1] was established
following the release of the lemon model, which constituted the first de-facto
standard for representing ontology-lexica, with the mission to “develop models
for the representation of lexica (and machine readable dictionaries) relative
to ontologies” [2]. The ensuing OntoLex-lemon model [3], [4] has served since
2016 as the leading option for conversion of lexicographic data into LD, and
has recently been updated with the lexicog module [5] released on 17 September
2019 [6]. This trend has been complemented since 2015 by relevant literature
(e.g. [7], [8], [9]), conference papers (e.g. [10], [11], [12], [13]) and
EU-funded projects ([14] and [15], [16]).
Besides a section including general research papers, the workshop will include
two shared task tracks – one on linking monolingual data and the other on
linking bilingual and multilingual data, as follows:
(1) Monolingual Word Sense Alignment – in conjunction with a shared task
conducted by ELEXIS.
Task 1 will be evaluated on novel dictionary linking data developed by the
ELEXIS project [15], which will cover linking for the following languages:
Danish, Dutch, English, Estonian, German, Hungarian, Irish, Italian, Serbian,
Slovene and Russian.
(2) Linking Bilingual and Multilingual Lexicographic Resources – in
conjunction with the 3rd TIAD shared task.
Task 2 will host the 3rd edition of the Translation Inference Across
Dictionaries (TIAD) shared task, of which previous editions were co-located at
Language, Data and Knowledge conferences [17], [18]. The aim is to explore
methods and techniques for automatically generating new bilingual (and
multilingual) dictionaries from existing ones in the context of a coherent
experiment framework that enables reliable validation of results and solid
comparison of the processes used. In particular, the participating systems will
be asked to generate new translations automatically among three languages –
English, French, Portuguese – based on known translations contained in the
Apertium RDF graph [19]. The inclusion of other language pairs will also be
possible for this edition.
MAIN TOPICS
We welcome any topic related to the main theme of linking lexicographic
resources, including but not limited to:
● Linking monolingual dictionaries and lexicographic resources
● Linking bilingual dictionaries and lexicographic resources
● Linking multilingual dictionaries and lexicographic resources
● Linking lexicographic data with other lexical data resources
● Applications and developments of the OntoLex-lemon model and its
lexicography module
● RDF serializations of lexicographic data
● Non-RDF data formats for linked lexicographic resources
● RDF and XML standards for linked lexicography
● Converting lexicographic data for linking purposes
● Linked Data-native lexicographic resources
● Automated generation of lexicographic resources based on Linked Data
technologies
● Lexicography, terminology and Linguistic Linked (Open) Data
● Linked lexicography and the Semantic Web
● Linked lexicography and the Multilingual Digital Single Market
● Linked lexicography and Knowledge Systems
● Linked lexicography and Artificial/Augmented Intelligence
● Linked lexicography, deep learning and neural networks
AUDIENCE
● Lexicographers and dictionary makers
● Computational and corpus linguists
● NLP researchers and engineers
● Terminologists
● Big data analysts
● Reference scientists and knowledge system managers
SUBMISSION INFORMATION
There are two types of submissions:
● Abstract (500-1,000 words) OR
● Full paper (6-10 pages)
For formatting guidelines for full papers, please use the LREC submission
format (http://lrec2020.lrec-conf.org/en/submission/authors-kit/). Both
abstracts and full papers will address any of the topics included in this CfP,
but full papers have the advantage of presenting the authors’ work and ideas at
a greater level of detail. All submissions must be received by the deadline
below and will be reviewed by experts in the field. Accepted proposals will be
invited (but not required) to submit the full paper for publication in the
workshop proceedings.
Further details on the submission procedure will be provided on the workshop
website later on.
IMPORTANT DATES
Submission deadline: February 14, 2020
Notification of acceptance: March 13, 2020
Camera-ready papers: April 15, 2020
GLOBALEX Workshop: May 12, 2020
ORGANIZERS
● Ilan Kernerman, K Dictionaries
● Simon Krek, Globalex, Jožef Stefan Institute
TRACK 1 ORGANIZER
● John McCrae, National University of Ireland – Galway
● Sina Ahmadi, National University of Ireland – Galway
TRACK 2 ORGANIZERS
● Jorge Gracia, University of Zaragoza
● Besim Kabashi, Friedrich-Alexander University of Erlangen-Nuremberg and
Ludwig-Maximilian University of Munich
SCIENTIFIC COMMITTEE (to be announced)
CONTACT (to be announced)
REFERENCES
[1] https://www.w3.org/community/ontolex/.
[2] McCrae, J., G. Aguado-de Cea, P. Buitelaar, P. Cimiano, T. Declerck, A,
Gomez-Perez, J. Gracia, L. Hollink, E, Montiel-Ponsoda, D. Spohr, and T.
Wunner. 2012. Interchanging lexical resources on the Semantic Web. Language
Resources and Evaluation, 46, pp. 701–719.
[3] https://www.w3.org/2016/05/ontolex/.
[4] McCrae, J., J. Bosque-Gil, J. Gracia, P. Buitelaar, and P. Cimiano. 2017.
The OntoLexLemon Model: Development and Applications. In Kosem et al. (eds.)
Electronic lexicography in the 21st century. Proceedings of eLex 2017
conference, in Leiden, Netherlands. Lexical Computing CZ s.r.o., pp. 587–597.
https://elex.link/elex2017/wp-content/uploads/2017/09/paper36.pdf/.
[5] https://www.w3.org/ns/lemon/lexicog#/.
[6] https://www.w3.org/2019/09/lexicog/.
[7] Gracia, J. 2015. Multilingual dictionaries and the Web of Data. Kernerman
Dictionary News, 23, pp. 1-4. https://www.kdictionaries.com/kdn/kdn23_2015.pdf/.
[8] Klimek, B., and M. Brummer. 2015. Enhancing lexicography with semantic
language databases. Kernerman Dictionary News, 23, pp. 5–10.
https://www.kdictionaries.com/kdn/kdn23_2015.pdf/.
[9] Bosque-Gil, J., J. Gracia, and A. Gomez-Perez. 2016. Linked data in
lexicography. Kernerman Dictionary News, 24, pp. 19–24.
https://www.kdictionaries.com/kdn/kdn24_2016.pdf/.
[10] Declerck, T., E. Wand-Vogt, and K. Morth. 2015. Towards a Pan European
Lexicography by Means of Linked (Open) Data. In Kosem et al. (eds.) Proceedings
of eLex 2015. Biennial Conference on Electronic Lexicography (eLex2015),
electronic lexicography in the 21st century: Linking lexical data in the
digital age. Ljubljana/Brighton: Trojina, Institute for Applied Slovene
Studies, Ljubljana, pp. 342-355.
https://elex.link/elex2015/proceedings/eLex_2015_22_Declerck+etal.pdf/.
[11] Abromeit, F., C. Chiarcos, C. Fath, and M. Ionov. 2016. Linking the Tower
of Babel: Modelling a Massive Set of Etymological Dictionaries as RDF. In
Proceedings of the 5th Workshop on Linked Data in Linguistics: Managing,
Building and Using Linked Language Resources (LDL-2016). pp. 11–19.
[12] Bosque-Gil, J., J. Gracia, and E. Montiel-Ponsoda. 2017. Towards a Module
for Lexicography in OntoLex. In Proceedings of the LDK workshops: OntoLex, TIAD
and Challenges for Wordnets at 1st Language Data and Knowledge conference (LDK
2017), Galway, Ireland, volume 1899. Galway (Ireland): CEUR-WS, pp. 74–84.
http://ceur-ws.org/Vol-1899/OntoLex{_}2017{_}paper{_}5.pdf/<http://ceur-ws.org/Vol-1899/OntoLex%7b_%7d2017%7b_%7dpaper%7b_%7d5.pdf/>.
[13] Gracia, J., I. Kernerman, and J. Bosque-Gil. 2017. Toward linked
data-native dictionaries. In I. Kosem et al. (eds.) Electronic lexicography in
the 21st century. Proceedings of eLex 2017 conference, in Leiden, Netherlands.
Lexical Computing CZ s.r.o., pp. 550–559.
https://elex.link/elex2017/wp-content/uploads/2017/09/paper33.pdf/.
[14] LDL4HELTA – Linked Data Lexicography for High-End Language Technology
Application. EUREKA Austria-Israel Bilateral R&D Programme No. 9898.
https://www.eurekanetwork.org/project/id/9898/.
[15] ELEXIS – European Lexicographic Infrastructure. European Union’s Horizon
2020 Research and Innovation Programme No. 731015. https://elex.is/.
[16] Prêt-à-LLOD. European Union’s Horizon 2020 Research and Innovation
Programme No. 825182. https://www.pret-a-llod.eu/.
[17] TIAD 2017 – Translation Inference across Dictionaries workshop and shared
task.
https://www.ldk2017.org/index-php/tiad-2017-shared-task-translation-inference-across-dictionaries/.
[18] TIAD 2019 – Translation Inference across Dictionaries workshop and shared
task. http://2019.ldk-conf.org/tiad-2019/.
[19] Apertium RDF - http://linguistic.linkeddata.es/apertium/.
TRACK 1 – 1st “Monolingual Word Sense Alignment” Shared Task
Call for Participation
The ELEXIS project is organizing a shared task on the task of monolingual word
sense alignment across dictionaries as part of the GLOBALEX 2020 – Linked
Lexicography workshop at the 12th Language Resources and Evaluation Conference
(LREC 2020) taking place on Tuesday, May 12 2020 in Marseille (France).
Monolingual word sense alignment is a challenging task of finding matching
senses between two dictionary entries and will play a crucial role in the
development of new lexical resources. In addition, this task presents a
challenging combination of NLP, semantic textual similarity and reasoning in
order to find the best alignment across a group of senses.
Description of Task
The task of monolingual word sense alignment is presented as a task of
predicting the relationship between two senses in one of five categories:
“exact”, “broader”, “narrower”, “related” or “none”. For each sense pair the
following information will be provided
- The lemma shared between the two entries
- The part of speech of the entries*
- The sense text (including definition) of the sense of the first entry
- The sense text (including definition) of the sense of the second
entry
- (Training Data) The label of the relation (“exact”, “broader”,
“narrower”, “related” or “none”)
For each pair of entry all mappings between senses will be provided, as such we
expect the best systems to consider the mapping of an entry as a block.
Training data will be available for monolingual dictionaries in the following
languages:
- Danish
- Dutch
- English
- Estonian
- German
- Hungarian*
- Irish
- Italian
- Serbian
- Slovenian
- Russian
*For Hungarian part-of-speech information is not provided
Participants may participate in any or all of the above languages. The test
data will consist of a group of entries with the label of the relation missing,
participants should submit the result in the same form of the training data,
that is the test data with the predicted label.
Publication of Results
Participants will submit a system paper that should include a description of
the system, the way the data has been processed, the applied algorithms, the
obtained results, as well as the conclusions and ideas for future improvements.
The papers will be peer reviewed prior to publication to confirm that all
aspects are well covered.
The workshop will accept also regular papers from participants who are not
participating in the shared task but still have worked in the topic of
translation inference and want to publish novel results or ideas, maybe with
different datasets and experimental basis as the ones proposed in this shared
task. Such papers will be peer reviewed on the basis of their scientific
quality.
All the accepted papers will be published as part of the Globalex workshop
proceedings and presented during the workshop.
Important Dates
17/12/2019 – Technical description of the evaluation process and data provided
by organisers
01/02/2020 – Release of extended Training Data
13/03/2020 – Submission of results by participants / submission of regular
papers
03/04/2020 – Evaluation results communicated by organisers / notification of
regular papers
14/04/2020 – Submission of system description papers
12/05/2020 – Workshop day
Organizers
John P. McCrae – Data Science Institute, National University of Ireland Galway
Sina Ahmadi – Data Science Institute, National University of Ireland Galway
TRACK 2 – 3rd "Translation Inference Across Dictionaries" (TIAD 2020) shared
task
CALL FOR PARTICIPATION
We are pleased to invite you to participate in the third shared task for
Translation Inference Across Dictionaries (TIAD 2020), that will be held in
conjunction to the GLOBALEX 2020 – Linked Lexicography workshop at the 12th
Language Resources and Evaluation Conference (LREC 2020) on Tuesday, May 12
2020 in Marseille (France).
This initiative is aimed at exploring best methods and techniques for
automatically generating new bilingual (and multilingual) dictionaries from
existing ones, in the context of a coherent experiment that enables reliable
validation of results and solid comparison of methods and techniques used for
the automatic generation of translations across languages. This initiative aims
also to stimulate and enhance further research on the topic.
TASK DEFINITION
The objective of the task is to explore and compare methods and techniques to
infer indirect translations between language pairs, based on existing bilingual
resources. Such techniques would help in auto-generating new bilingual and
multilingual dictionaries based on existing ones.
In particular, the participating systems will be asked to indirectly generate
translations among three languages, namely Portuguese, French and English,
based on already known translations contained in the Apertium RDF graph
(http://linguistic.linkeddata.es/apertium/). The three chosen languages (EN,
FR, PT) are not directly connected in the Apertium RDF graph
(https://tinyurl.com/apertiumRDF-lang), therefore no direct translations can be
obtained among them in Apertium RDF.
Based on the available RDF data, the participants will have to apply their
methods and techniques to discover indirect translations (mediated by any other
language in the graph) between the pairs: (EN, FR), (FR, PT), and (PT, EN).
In addition, participants are welcome to make use of other freely available
sources of background knowledge (e.g., lexical linked open data and parallel
corpora) to improve performance, as long as direct translations among the
considered language pairs from such extra resources are NOT used.
The inclusion of other language pairs in the evaluation could be considered, in
which case this will be conveniently announced to participants.
Evaluation of the results will be carried out by the organisers against
manually compiled pairs of K Dictionaries from the Global Series
(https://lexicala.com/resources#dictionaries) and other resources.
PUBLICATION OF RESULTS
Participants will submit a system paper that should include a description of
the system, the way the data have been processed, the applied algorithms, the
obtained results, as well as the conclusions and ideas for future improvements.
The papers will be peer reviewed prior to publication to confirm that all
aspects are well covered.
The workshop will accept also regular papers from participants who are not
participating in the shared task but still have worked in the topic of
translation inference and want to publish novel results or ideas, maybe with
different datasets and experimental basis as the ones proposed in this shared
task. Such papers will be peer reviewed on the basis of their scientific
quality.
All the accepted papers will be published as part of the Globalex workshop
proceedings and presented during the workshop.
IMPORTANT DATES
17/12/2019 – Technical description of the evaluation process and data provided
by organisers
14/02/2020 – Submission of regular papers (not participating systems)
13/03/2020 – Submission of results by participating systems / notification of
regular papers
02/04/2020 – Evaluation results communicated by organisers / camera–ready of
regular papers
14/04/2020 – Submission of system description papers
12/05/2020 – Workshop day
ORGANISERS
● Jorge Gracia, University of Zaragoza, Spain
● Besim Kabashi. Friedrich-Alexander-University of Erlangen-Nuremberg,
and Ludwig-Maximilian-University of Munich, Germany
REVIEW COMMITTEE
To be announced
WEBSITE
A full description of TIAD-2020 will be available at https://tiad2020.unizar.es/
Identify, Describe and Share your LRs!
Describing your LRs in the LRE Map is now a normal practice in the submission
procedure of LREC (introduced in 2010 and adopted by other conferences). To
continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools,
web-services, etc.), authors will have the possibility, when submitting a
paper, to upload LRs in a special LREC repository. This effort of sharing LRs,
linked to the LRE Map for their description, may become a new “regular” feature
for conferences in our field, thus contributing to creating a common repository
where everyone can deposit and share data.
As scientific work requires accurate citations of referenced work so as to
allow the community to understand the whole context and also replicate the
experiments conducted by other researchers, LREC 2016 endorses the need to
uniquely Identify LRs through the use of the International Standard Language
Resource Number (ISLRN, www.islrn.org<http://www.islrn.org>), a Persistent
Unique Identifier to be assigned to each Language Resource. The assignment of
ISLRNs to LRs cited in LREC papers will be offered at submission time.
--------------------------------------------------