Twitter as a Comparable Corpus to build Multilingual Affective Lexicons

Amel Fraisse and Patrick Paroubek
LIMSI-CNRS


Abstract

The main issue of any lexicon-based sentiment analysis system is the lack of affective lexicon. Such lexicons contain lists of words annotated with their affective classes. There exist some number of such resources but only for a few number of language and affective classes are, generally, reduced to two classes ({positive and negative). In this paper we propose to use Twitter as a comparable corpus to generate a fine-grained and multilingual affective lexicons. Our approach is based in the co-occurence between English and target affective words in the same emotional corpus. And it can be applied for any target language. We experiment it to generate affective lexicons for seven languages (en, fr, de, it, es, pt, ru).