17th Workshop on Building and Using Comparable Corpora
Program: Monday, 20 May, 2024
All timings are in Italian time (CEST = Central European Summer Time: UTC+2)
09:00–10:30 | Session 1 |
09:00–09:30 | On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings
Guillem Ramírez, Rumen Dangovski, Preslav Nakov and Marin Soljacic |
09:30–10:00 | Modeling Diachronic Change in English Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal
Julius Steuer, Marie-Pauline Krielke, Stefan Fischer, Stefania Degaetano-Ortlieb, Marius Mosbach and Dietrich Klakow |
10:00–10:30 | PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
Tomás Freitas Osório, Bernardo Leite, Henrique Lopes Cardoso, Luís Gomes, João Rodrigues, Rodrigo Santos and António Branco |
10:30–11:00 | Coffee break |
11:00–12:00 | Invited Talk |
11:00–12:00 | The Way Towards Massively Multilingual Language Models (slides)
François Yvon |
12:00–13:00 | Session 2 |
12:00–12:30 | Quality and Quantity of Machine Translation References for Automatic Metrics
Vilém Zouhar and Ondřej Bojar |
12:30–13:00 | Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long, ZhenHao Tang, Xianghua Fu, Jian Chen, Shilong Hou and Jinze Lyu |
13:00–14:00 | Lunch break |
14:00–16:00 | Session 3 |
14:00–14:30 | Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles
Abdelhadi Soudi, Mohamed Hannani, Kristof Van Laerhoven and Eleftherios Avramidis |
14:30–15:00 | INCLURE: a Dataset and Toolkit for Inclusive French Translation
Paul Lerner and Cyril Grouin |
15:00–15:30 | BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation
Sourav Saha, Zeshan Ahmed Nobin, Mufassir Ahmad Chowdhury, Md. Shakirul Hasan Khan Mobin, Mohammad Ruhul Amin and Sudipta Kar |
15:30–16:00 | Booster presentations poster authors |
16:00–16:30 | Coffee break |
16:30–18:00 | Poster session |
| Creating Clustered Comparable Corpora from Wikipedia with Different Fuzziness Levels and Language Representativity
Anna Laskina, Eric Gaussier and Gaelle Calvary |
| EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Marc Kupietz, Piotr Banski, Nils Diewald, Beata Trawinski and Andreas Witt |
| Building Annotated Parallel Corpora Using the ATIS Dataset: Two UD-style treebanks in English and Turkish
Neslihan Cesur, Aslı Kuzgun, Mehmet Kose and Olcay Taner Yıldız |
| Bootstrapping the Annotation of UD Learner Treebanks
Arianna Masciolini |
| SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish
Felix Morger |
| Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish (no physical poster)
Deniz Zeyrek, Giedrė Valūnaitė Oleškevičienė and Amalia Mendes |
| mini-CIEP+ : A Shareable Parallel Corpus of Prose
Annemarie Verkerk and Luigi Talamo |
Last modified: 21 May 2024