18th Workshop on Building and Using Comparable Corpora
Program: Monday, 20 Jan, 2025
9:15–9:30 Opening and introduction |
9:30–10:30 Multilingual corpus development |
Bilingual resources for Moroccan Sign Language Generation and Standard Arabic Skills Improvement of Deaf Children Abdelhadi Soudi1, Corinne Vinopol2, Kristof Van Laerhoven3 1École Nationale Supérieure des Mines de Rabat, Morocco, 2Institute for Disabilities Research and Training, USA, 3University of Siegen, Germany |
Harmonizing Annotation of Turkic Postverbial Constructions: A Comparative Study of UD Treebanks Arofat Akhundjanova Saarland University, Germany |
10:30–11:00 Coffee break, morning |
11:00–13:00 Multilinguality of Large Language Models |
KEYNOTE:
Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models Preslav Nakov Mohamed bin Zayed University of Artificial Intelligence, UAE |
Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs Asli Umay Ozturk, Recep Firat Cekinel, Pinar Karagoz Middle East Technical University (METU), Turkey |
BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language Ehsan Lotfi, Nikolay Banar, Walter Daelemans University of Antwerp, Belgium |
13:00–14:00 Lunch |
14:00–15:30 Machine Translation and Cross-lingual Processing |
Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models Chia-Hsuan Chang1, Tien Yuan Huang2, Yi-Hang Tsai2, Chia-Ming Chang2, San-Yih Hwang2 1Yale University, USA, 2National Sun Yat-sen University, Taiwan |
The Role of Handling Attributive Nouns in Improving Chinese-To-English Machine Translation Haohao (Lisa) Wang1, Adam Meyers2, John E. Ortega3, Rodolfo Zevallos4 1Carnegie Mellon University, USA, 2New York University, USA, 3Northeastern University, USA, 4Barcelona Supercomputing Center, Spain |
Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection Aso Mahmudi1, Borja Herce2, Demian Inostroza Améstica1, Andreas Scherbakov1, Eduard H Hovy1, Ekaterina Vylomova1 1The University of Melbourne, Australia, 2University of Zurich, Switzerland |
15:30–16:00 Coffee break, afternoon |
16:00–17:30 Diversity of language resources |
KEYNOTE:
Comparable Corpora: Opportunities for New Research Directions Kenneth Ward Church Northeastern University, USA |
SELEXINI – a large and diverse automatically parsed corpus of French Manon Scholivet1, Agata Savary1, Louis Estève1, Marie Candito2, Carlos Ramisch3 1Université Paris-Saclay, France, 2Université Paris Cité, France 3Aix Marseille University, France |
17:30–17:45 Closing remarks |
Last modified: 5 Jan 2025