BUCC, 15th Workshop on Building and Using Comparable Corpora
TOPICS
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
- Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
- Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
- Mining from Comparable Corpora:
- Cross-language distributional semantics and pre-trained multilingual transformer models
- Creation of bilingual and multilingual embeddings from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, and paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from comparable corpora
- Induction of multilingual word classes from comparable corpora
IMPORTANT DATES
April1020, 2022 Paper submission deadline: extended to April 20 May 3, 2022 Notification to authors May 23, 2022 Camera-ready final papers June 25, 2022 Workshop date
PRACTICAL INFORMATION
Registration will be via the main conference website LREC 2022
SUBMISSION GUIDELINES
Please follow the style sheet and templates provided for the main conference at https://lrec2022.lrec-conf.org/en/submission2022/authors-kit/. Papers should be submitted as a PDF file using the START conference manager
at https://www.softconf.com/lrec2022/BUCC/. Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references.
It is the authors' choice whether or not to reveal their identities in their manuscripts submitted for review. Accepted papers will be published in the workshop proceedings.
Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately notified to the workshop organizers by e-mail.
In case of questions, please contact Reinhard Rapp: reinhardrapp (at) gmx (dot) de
Information from the LREC organizers
Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when
submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.
As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2022 endorses the need to uniquely Identify LRs through the use of the International Standard
Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.