15th Workshop on Building and Using Comparable Corpora

Programme

9:00 Opening
Session 1: Invited Presentation
9:10 Bilingual Induction and Pseudo Parallel Corpora
Alexander Fraser
Session 2: Comparative dependency parsing
10:00 Multilingual Comparative Analysis of Deep-Learning Dependency Parsing Results Using Parallel Corpora
Diego Alves, Marko Tadić and Božo Bekavac
10:30 Coffee Break
Session 3: Building corpora and lexicon induction
11:00 Building Domain-specific Corpora from the Web: the Case of European Digital Service Infrastructures
Rik van Noord, Cristian García-Romero, Miquel Esplà-Gomis, Leopoldo Pla Sempere and Antonio Toral
11:30 Challenges of Building Domain-Specific Parallel Corpora from Public Administration Documents
Filip Klubička, Lorena Kasunić, Danijel Blazsetin and Petra Bago
12:00 Setting Up Bilingual Comparable Corpora with Non-Contemporary Languages
Helena Bermudez Sabel, Francesca Dell’Oro, Cyrielle Montrichard and Corinne Rossari
12:30 About Evaluating Bilingual Lexicon Induction
Martin Laville, Emmanuel Morin and Phillippe Langlais
13:00 Lunch Break
Session 4: Word embeddings
14:00 Evaluating Monolingual and Crosslingual Embeddings on Datasets of Word Association Norms
Trina Kwong, Emmanuele Chersoni and Rong Xiang
14:30 Don’t Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings
Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser and Hinrich Schütze
Session 5: Shared task on bilingual term alignment
15:00 Overview of the 2022 BUCC Shared Task: Bilingual Alignment in Comparable Specialized Corpora
Omar Adjali, Emmanuel Morin, Serge Sharoff, Reinhard Rapp and Pierre Zweigenbaum
15:30 Fusion of linguistic, neural and sentence-transformer features for improved term alignment
Andraž Repar, Boshko Koloski, Matej Ulčar and Senja Pollak
16:00 Coffee Break
16:30 CUNI Submission to the BUCC 2022 Shared Task on Bilingual Term Alignment
Borek Požár, Klára Tauchmanová, Kristýna Neumannová, Ivana Kvapilíková and Ondřej Bojar
17:00-17:10 Closing