Comparabilty of Corpora in Human and Machine Translation

Ekaterina Lapshinova-Koltunski1 and Santanu Pal2
1Universität des Saarlandes, 2Saarland University


Abstract

In this study, we demonstrate a negative result from a work on comparable corpora which forces us to address a problem of comparability in both human and machine translation. We state that it is not always defined similarly, and corpora used in contrastive linguistics or human translation analysis cannot always be applied for statistical machine translation (SMT). So, we revise the definition of comparability and show that some notions from translatology, i.e. registerial features, should also be considered in machine translation (MT).