10th Workshop on Building and Using Comparable Corpora

 

 

Workshop Program

Thursday, August 3, 2017, Cypress 1

9:00-9:05Opening
9:05-10:00Invited presentation
 Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research
Phillippe Langlais
10:00-10:30Session 1: Plagiarism detection
 Deep Investigation of Cross-Language Plagiarism Detection Methods
Jérémy Ferrero, Laurent Besacier, Didier Schwab and Frédéric Agnès
10:30-11:00Coffee break
11:00-12:00Session 2: Sentence alignment and lexicon acquisition
 Sentence Alignment using Unfolding Recursive Autoencoders
Jeenu Grover and Pabitra Mitra
 Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
Michael Bloodgood and Benjamin Strauss
12:00-14:00Lunch
14:00-15:30Session 3: Building comparable corpora
 Toward a Comparable Corpus of Latvian, Russian and English Tweets
Dmitrijs Milajevs
 Automatic Extraction of Parallel Speech Corpora from Dubbed Movies
Alp Öktem, Mireia Farrús and Leo Wanner
 A parallel collection of clinical trials in Portuguese and English
Mariana Neves
15:30-16:00Coffee break
16:00-17:40Session 4: Shared task session
 Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora
Pierre Zweigenbaum, Serge Sharoff and Reinhard Rapp
 Weighted Set-Theoretic Alignment of Comparable Sentences
Andoni Azpeitia, Thierry Etchegoyhen and Eva Martínez Garcia
 BUCC 2017 Shared Task: a First Attempt Toward a Deep Learning Framework for Identifying Parallel Sentences in Comparable Corpora
Francis Grégoire and Philippe Langlais
 zNLP: Identifying Parallel Sentences in Chinese-English Comparable Corpora
Zheng Zhang and Pierre Zweigenbaum
 BUCC2017: A Hybrid Approach for Identifying Parallel Sentences in Comparable Corpora
Sainik Mahata, Dipankar Das and Sivaji Bandyopadhyay
17:40-17:50Closing