14th Workshop on Building and Using Comparable Corpora

DETAILS FOR ATTENDANCE

The workshop will take place online through Zoom. Details for participation are provided on the main conference Web site. This is the Zoom link to connect to the workshop. In the unlikely case of unforeseen problems with the Zoom session, a new link will be provided here.
The workshop proceedings (full PDF; full list of BibTeX entries) are available on the ACL Anthology. See also below direct links for each individual paper to ACL anthology page, PDF and BibTeX entry.
Last modified: 21 Jan 2022

 

 

Programme

All times are in UTC+0. For a time zone converter and time difference calculator, see e.g. https://www.timeanddate.com/worldclock/converter.html.

Here are a few examples of time conversions for the workshop's starting and closing times:

UTC-7: San Francisco UTC-4: Baltimore UTC+0: Reykjavik UTC+1: Dartmouth, Dublin, Leeds UTC+2: Antwerp, Barcelona, Göttingen, Mainz, Munich, Paris, Prague UTC+3: Varna UTC+5:30: Kochi, Mumbai UTF+9: Fukuoka
Starting 1:00 (am) 4:00 (am) 8:00 9:00 10:00 11:00 13:30 17:00
Closing 9:00 (am) 12:00 (noon) 16:00 17:00 18:00 19:00 21:30 1:00 (am)
8:00-8:05 Welcome
8:05-9:00 Invited presentation: Machine Translation in Low Resource Setting [PDF] [BIB]
Pushpak Bhattacharyya
9:00-9:25 EM Corpus: a comparable corpus for a less-resourced language pair Manipuri-English [PDF] [BIB]
Rudali Huidrom, Yves Lepage and Khogendra Khomdram
9:25-9:40 Coffee break
9:40-10:05 Mining Bilingual Word Pairs from Comparable Corpus using Apache Spark Framework [PDF] [BIB]
Sanjanasri JP, Vijay Krishna Menon, Soman KP and Krzysztof Wolk
10:05-10:30 Employing Wikipedia as a resource for Named Entity Recognition in Morphologically complex under-resourced languages [PDF] [BIB]
Aravind Krishnan, Stefan Ziehe, Franziska Pannach and Caroline Sporleder
10:30-10:55 Semi-Automated Labeling of Requirement Datasets for Relation Extraction [PDF] [BIB]
Jeremias Bohn, Jannik Fischbach, Martin Schmitt,Hinrich Schütze and Andreas Vogelsang
10:55-11:20 A Dutch Dataset for Cross-lingual Multilabel Toxicity Detection [PDF] [BIB]
Ben Burtenshaw and Mike Kestemont
11:20-12:10 Lunch break
12:10-13:05 Invited presentation: Language modeling and AI
Tomas Mikolov
13:05-13:30 Syntax-aware Transformers for Neural Machine Translation: The Case of Text to Sign Gloss Translation [PDF] [BIB]
Santiago Egea Gómez, Euan McGill and Horacio Saggion
13:30-13:55 Effective Bitext Extraction from Comparable Corpora Using a Combination of Three Different Approaches [PDF] [BIB]
Steinþór Steingrímsson, Pintu Lohar, Hrafn Loftsson and Andy Way
13:55-14:10 Coffee break
14:10-14:35 Majority Voting with Bidirectional Pre-translation For Bitext Retrieval [PDF] [BIB]
Alexander G. Jones and Derry Tanti Wijaya
14:35-15:00 On Pronunciations in Wiktionary: Extraction and Experiments on Multilingual Syllabification and Stress Prediction [PDF] [BIB]
Winston Wu and David Yarowsky
15:00-15:55 Invited presentation: Large-scale Deep Learning for Low-Resource AI
Sujith Ravi
15:55-16:00 Closing