Subject: 1st CFP: LREC 9th Workshop and Shared Task on Building and Using Comparable Corpora ============================================================ Call for Papers 9th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA Special Topic: Continuous Vector Space Models and Comparable Corpora Shared Task: Identifying Parallel Segments in Comparable Corpora https://comparable.limsi.fr/bucc2016/ Monday, May 23, 2016 Co-located with LREC 2016, Portorož, Slovenia DEADLINE FOR PAPERS: February 10, 2016 ============================================================ MOTIVATION In the language engineering and the linguistics communities, research on comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for statistical Natural Language Processing applications such as statistical machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest in themselves by making possible inter-linguistic discoveries and comparisons. It is generally accepted in both communities that comparable corpora are documents in one or several languages that are comparable in content and form in various degrees and dimensions. We believe that the linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of statistical NLP. As such, it is of great interest to bring together builders and users of such corpora. SHARED TASK There will be a shared task on "Identifying Parallel Segments in Comparable Corpora" whose details will be described on the workshop website (URL see above). TOPICS Beyond this year's special topic "Continuous Vector Space Models and Comparable Corpora" and the shared task on "Identifying Parallel Segments in Comparable Corpora", we solicit contributions including but not limited to the following topics: Building comparable corpora: * Human translations * Automatic and semi-automatic methods * Methods to mine parallel and non-parallel corpora from the Web * Tools and criteria to evaluate the comparability of corpora * Parallel vs non-parallel corpora, monolingual corpora * Rare and minority languages, across language families * Multi-media/multi-modal comparable corpora Applications of comparable corpora: * Human translations * Language learning * Cross-language information retrieval & document categorization * Bilingual projections * Machine translation * Writing assistance Mining from comparable corpora: * Cross-language distributional semantics * Extraction of parallel segments or paraphrases from comparable corpora * Extraction of translations of single words and multi-word expressions, proper names, named entities, etc. IMPORTANT DATES February 10, 2016 Deadline for submission of full papers March 10, 2016 Notification of acceptance March 25, 2016 Camera-ready papers due May 23, 2016 Workshop date SUBMISSION INFORMATION Papers should follow the LREC main conference formatting details (to be announced on the conference website http://lrec2016.lrec-conf.org/en/ ) and should be submitted as a PDF-file via the START workshop manager at https://www.softconf.com/lrec2016/BUCC2016/ Contributions can be short or long papers. Short paper submission must describe original and unpublished work without exceeding six (6) pages. Characteristics of short papers include: a small, focused contribution; work in progress; a negative result; an opinion piece; an interesting application nugget. Long paper submissions must describe substantial, original, completed and unpublished work without exceeding ten (10) pages. Reviewing will be double blind, so the papers should not reveal the authors' identity. Accepted papers will be published in the workshop proceedings. Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately notified to the workshop organizers. Please also observe the following two paragraphs which are applicable to all LREC workshops as well as to the main conference: Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data. As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2016 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time. ORGANISERS Reinhard Rapp, University of Mainz (Germany) Pierre Zweigenbaum, LIMSI, CNRS, Orsay (France) Serge Sharoff, University of Leeds (UK) FURTHER INFORMATION Reinhard Rapp: reinhardrapp (at) gmx (dot) de SCIENTIFIC COMMITTEE * Ahmet Aker, University of Sheffield (UK) * Hervé Déjean (Xerox Research Centre Europe, Grenoble, France) * Éric Gaussier (Université Joseph Fourier, Grenoble, France) * Gregory Grefenstette (INRIA, Saclay, France) * Silvia Hansen-Schirra (University of Mainz, Germany) * Hitoshi Isahara (Toyohashi University of Technology) * Kyo Kageura (University of Tokyo, Japan) * Philippe Langlais (Université de Montréal, Canada) * Michael Mohler (Language Computer Corp., US) * Emmanuel Morin (Université de Nantes, France) * Lene Offersgaard (University of Copenhagen, Denmark) * Dragos Stefan Munteanu (Language Weaver, Inc., US) * Ted Pedersen (University of Minnesota, Duluth, US) * Reinhard Rapp (University of Mainz, Germany) * Serge Sharoff (University of Leeds, UK) * Michel Simard (National Research Council Canada) * Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)