BUCC, 14th Workshop on Building and Using Comparable Corpora Special Topic: Neural Networks in Comparable Corpora Research Co-located with RANLP 2021, online 6 or 7 September 2021 Website: https://comparable.limsi.fr/bucc2021/ MOTIVATION Research on comparable corpora is active but used to be scattered among many workshops and conferences. Hence this workshop series, which bundles this research and gives it a better platform. In the language engineering and the linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is mainly motivated by the need to use comparable corpora as training data for statistical natural language processing applications such as neural machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest in themselves by making possible inter-linguistic discoveries and comparisons. It is generally accepted in both communities that comparable corpora are documents in one or several languages that are comparable in content and form in various degrees and dimensions. Comparable corpora have been used in a range of applications, including Information Retrieval, Machine Translation, Cross-lingual Text Classification, etc. We believe that the linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of statistical NLP, for example to extract parallel corpora from comparable corpora for neural MT. As such, it is of great interest to bring together builders and users of such corpora. TOPICS We solicit contributions on all topics related to comparable corpora, including but not limited to the following: Building Comparable Corpora: • Human translations • Automatic and semi-automatic methods • Methods to mine parallel and non-parallel corpora from the Web • Tools and criteria to evaluate the comparability of corpora • Parallel vs non-parallel corpora, monolingual corpora • Rare and minority languages, across language families • Multi-media/multi-modal comparable corpora Applications of comparable corpora: • Human translations • Language learning • Cross-language information retrieval & document categorization • Bilingual and multilingual projections • Machine translation • Writing assistance Mining from Comparable Corpora: • Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models • Extraction of parallel segments or paraphrases from comparable corpora • Methods to extract parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation) • Extraction of bilingual and multilingual translations of single words and multi-word expressions; proper names, named entities, etc. IMPORTANT DATES 5 July 2021 Paper submission deadline 31 July 2021 Notification to authors 31 July 2021 Early bird registration (reduced rates) 31 Aug 2021 Camera-ready final papers 7 or 8 Sep 2021 Workshop date SUBMISSION INFORMATION Please follow the style sheet and templates provided for the main conference at http://ranlp.org/ranlp2021/submissions. Papers should be submitted as a PDF file at [to be specified]. Submissions must describe original and unpublished work and range from four (4) to eight (8) pages plus unlimited references. Reviewing will be double blind, so the papers should not reveal the authors’ identity. Accepted papers will be published in the workshop proceedings. Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately notified to the workshop organizers. In case of questions, please contact Reinhard Rapp: ORGANISERS Reinhard Rapp (Athena R.C., Greece, Magdeburg-Stendal University of Applied Sciences and University of Mainz, Germany), Chair and contact person: Serge Sharoff (University of Leeds, United Kingdom) Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France) SCIENTIFIC COMMITTEE [to be specified] Last modified: 10 May 2021