BUCC, 7th Workshop on Building and Using Comparable Corpora
Building Resources for Machine Translation Research
We solicit contributions including but not limited to the following topics.
Topics related to the special theme:
- Methods and tools for collecting and processing MT data, including crowdsourcing
- Methods and tools for quality control
- Tools for efficient annotation
- Bilingual term and named entity collections
- Multilingual treebanks, wordnets, propbanks, etc.
- Comparable corpora with parallel units annotated
- Comparable corpora for under-resourced languages and specific domains
- Multilingual corpora with rich annotations: POS tags, NEs, dependencies, semantic roles, etc.
- Data for special applications: patent translation, movie subtitles, MOOCs, meetings, chat-rooms, social media, etc.
- Legal issues with collecting and redistributing data and generating derivatives
- Building Comparable Corpora:
- Human translations
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the Web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
- Applications of comparable corpora:
- Human translations
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual projections
- Machine translation
- Writing assistance
- Mining from Comparable Corpora:
- Extraction of parallel segments or paraphrases from comparable corpora
- Extraction of bilingual and multilingual translations of single words and multi-word expressions; proper names, named entities, etc.
Note that an edited book “Building and Using Comparable Corpora” has just been published by Springer.
Chapter 1, an introduction and state of the art on the topic, is now freely available on Springer’s Web site: Overviewing Important Aspects of the Last 20 Years of Research in Comparable Corpora.
|23 February 2014||Deadline for submission of full papers|
|10 March 2014||Notification of acceptance|
|27 March 2014||Camera-ready papers due|
|27 May 2014||Workshop date|
Papers should follow the LREC main conference formatting details at http://lrec2014.lrec-conf.org/en/submission/authors-kit/ and should be submitted as a PDF-file via the START workshop manager at https://www.softconf.com/lrec2014/BUCC2014/.
Contributions can be short or long papers. Short paper submission must describe original and unpublished work without exceeding six (6) pages. Characteristics of short papers include: a small, focused contribution; work in progress; a negative result; an opinion piece; an interesting application nugget. Long paper submissions must describe substantial, original, completed and unpublished work without exceeding ten (10) pages.
Reviewing will be double blind, so the papers should not reveal the authors’ identity. Accepted papers will be published in the workshop proceedings.
Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately notified to the workshop organizers.
When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.), to enable their reuse, replicability of experiments, including evaluation ones, etc.
For further information, please contact Pierre Zweigenbaum mailto:pz(erase_at)limsi(erase_dot)fr
Authors of selected papers will be encouraged to submit substantially extended versions of their manuscripts to an upcoming special issue on “Machine Translation Using Comparable Corpora” of the Journal of Natural Language Engineering.