Subject: 1st CFP: ACL 10th Workshop and Shared Task on Building and Using Comparable Corpora 10th Workshop on Building and Using Comparable Corpora Shared task: detection of parallel sentences in Comparable Corpora Co-located with ACL2017, Vancouver, Canada, 3 August, 2017. Important dates Workshop Submission deadline: 21 April, 2017 Workshop Notification: 19 May, 2017 Workshop Camera Ready: 26 May, 2017 Website: http://comparable.limsi.fr/bucc2017/ Shared task: identifying parallel segments in comparable corpora We announce a new shared task for 2017. As is well known, a bottleneck in statistical machine translation is the scarceness of parallel resources for many language pairs and domains. Previous research has shown that this bottleneck can be reduced by utilizing parallel portions found within comparable corpora. These are useful for many purposes, including automatic terminology extraction and the training of statistical MT systems. The aim of the shared task is to quantitatively evaluate competing methods for extracting parallel segments, so as to give an overview on the state of the art and to identify the best performing approaches. All short papers describing the systems will be accepted at the workshop. Shared task sample set release: 30 January, 2017 Shared task training set release: 13 February, 2017 Shared task test set release: 26 April, 2017 Shared task test submission deadline: 28 April, 2017 Shared task camera ready papers: 26 May, 2017 Motivation In the language engineering and the linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, it is chiefly motivated by the need to use comparable corpora as training data for statistical NLP applications such as statistical machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest in themselves by making possible intra-linguistic discoveries and comparisons. It is generally accepted in both communities that comparable corpora are documents in one or several languages that are comparable in content and form in various degrees and dimensions. We believe that the linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of statistical NLP. As such, it is of great interest to bring together builders and users of such corpora. TOPICS We solicit contributions including but not limited to the following topics. Building Comparable Corpora: • Human translations • Automatic and semi-automatic methods • Methods to mine parallel and non-parallel corpora from the Web • Tools and criteria to evaluate the comparability of corpora • Parallel vs non-parallel corpora, monolingual corpora • Rare and minority languages, across language families • Multi-media/multi-modal comparable corpora Applications of comparable corpora: • Human translations • Language learning • Cross-language information retrieval & document categorization • Bilingual projections • Machine translation • Writing assistance • Machine learning techniques using comparable corpora Mining from Comparable Corpora: • Induction of morphological, grammatical, and translation rules from comparable corpora • Extraction of parallel segments or paraphrases from comparable corpora • Extraction of bilingual and multilingual translations of single words and multi-word expressions, proper names, and named entities from comparable corpora • Induction of multilingual word classes from comparable corpora • Cross-language distributional semantics Submission Information See BUCC 2017 website: http://comparable.limsi.fr/bucc2017/ Workshop organisers: Serge Sharoff (University of Leeds, UK), Chair Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France), Shared task organiser Reinhard Rapp (University of Mainz, Germany)