BUCC, 16th Workshop on Building and Using Comparable Corpora
TOPICS
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
- Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
- Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
- Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, and paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from comparable corpora
- Induction of multilingual word classes from comparable corpora
- Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
IMPORTANT DATES
July1831, 2023 Paper submission deadline: extended to July 31st August1222, 2023 Notification of acceptance August 25, 2023 Camera-ready final papers September 7, 2023 Workshop dateFor updates follow the present Web page.
PRACTICAL INFORMATION
Workshop registration is via the main conference registration site.
The workshop proceedings will be published in the ACL Anthology.
SUBMISSION GUIDELINES
Please follow the style sheet and templates (for LaTeX, Overleaf and MS-Word) provided for the main conference.
Papers should be submitted as a PDF file using the START conference manager.
Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references.
Reviewing will be double blind, so the papers should not reveal the authors' identity. Accepted papers will be published in the workshop proceedings, which will be included in the ACL Anthology.
Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately (i.e. as soon as known to the authors) notified to the workshop organizers by e-mail.
For further information and updates see the present Web page.