18th Workshop on Building and Using Comparable Corpora

Program: Monday, 20 Jan, 2025

9:15–9:30 Opening and introduction
9:30–10:30 Multilingual corpus development
Bilingual resources for Moroccan Sign Language Generation and Standard Arabic Skills Improvement of Deaf Children
Abdelhadi Soudi1, Corinne Vinopol2, Kristof Van Laerhoven3
1École Nationale Supérieure des Mines de Rabat, Morocco, 2Institute for Disabilities Research and Training, USA, 3University of Siegen, Germany
Harmonizing Annotation of Turkic Postverbial Constructions: A Comparative Study of UD Treebanks
Arofat Akhundjanova
Saarland University, Germany
10:30–11:00 Coffee break, morning
11:00–13:00 Multilinguality of Large Language Models
KEYNOTE: Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models
Preslav Nakov
Mohamed bin Zayed University of Artificial Intelligence, UAE
Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs
Asli Umay Ozturk, Recep Firat Cekinel, Pinar Karagoz
Middle East Technical University (METU), Turkey
BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language
Ehsan Lotfi, Nikolay Banar, Walter Daelemans
University of Antwerp, Belgium
13:00–14:00 Lunch
14:00–15:30 Machine Translation and Cross-lingual Processing
Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models
Chia-Hsuan Chang1, Tien Yuan Huang2, Yi-Hang Tsai2, Chia-Ming Chang2, San-Yih Hwang2
1Yale University, USA, 2National Sun Yat-sen University, Taiwan
The Role of Handling Attributive Nouns in Improving Chinese-To-English Machine Translation
Haohao (Lisa) Wang1, Adam Meyers2, John E. Ortega3, Rodolfo Zevallos4
1Carnegie Mellon University, USA, 2New York University, USA, 3Northeastern University, USA, 4Barcelona Supercomputing Center, Spain
Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection
Aso Mahmudi1, Borja Herce2, Demian Inostroza Améstica1, Andreas Scherbakov1, Eduard H Hovy1, Ekaterina Vylomova1
1The University of Melbourne, Australia, 2University of Zurich, Switzerland
15:30–16:00 Coffee break, afternoon
16:00–17:30 Diversity of language resources
KEYNOTE: Comparable Corpora: Opportunities for New Research Directions
Kenneth Ward Church
Northeastern University, USA
SELEXINI – a large and diverse automatically parsed corpus of French
Manon Scholivet1, Agata Savary1, Louis Estève1, Marie Candito2, Carlos Ramisch3
1Université Paris-Saclay, France, 2Université Paris Cité, France 3Aix Marseille University, France
17:30–17:45 Closing remarks
Last modified: 5 Jan 2025