18th Workshop on Building and Using Comparable Corpora (BUCC)
TOPICS
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
- Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
- Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
- Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, and paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from comparable corpora
- Induction of multilingual word classes from comparable corpora
- Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors’ corpora in forensic linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typologicbest student paper award at LREC-COLING-2024al research
This year we will run a shared task aimed at detecting translations of terms of comparable corpora for several language pairs. We have already prepared training and testing data, as well as the protocols for evaluation of submissions. This is a timely topic as evident from the best student paper award at LREC-COLING-2024.
IMPORTANT DATES
Deadlines are “anywhere on Earth.”
30 Nov 2024 Paper submission deadline 8 Dec 2024 Notification of acceptance 12 Dec 2024 Camera-ready final papers 20 Jan 2025 Workshop date For updates, please follow the present Web page.
PRACTICAL INFORMATION
The workshop is an in-person event. Workshop registration is via the main conference registration site.
The workshop proceedings will be published in the ACL Anthology.
SUBMISSION GUIDELINES
Please follow the style sheet and templates (for LaTeX, Overleaf, and MS-Word) provided for the main conference.
Papers should be submitted as a PDF file using the START conference manager.
Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references.
Reviewing will be double blind, so the papers should not reveal the authors’ identity. Accepted papers will be published in the workshop proceedings, which will be included in the ACL Anthology.
Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately (i.e. as soon as known to the authors) notified to the workshop organizers by e-mail.
For further information and updates see the present Web page.
PDF CFP : bucc2025-cfp.pdf
Last modified: 13 Dec 2025
Last modified: 13 Dec 2025