An Annotated Corpus for Sentiment Analysis in Political News

  • Gabriel Domingos de Arruda USP
  • Norton Trevisan Roman USP
  • Ana Maria Monteiro FACCAMP

Resumo


This article describes a corpus of news texts in Brazilian Portuguese. News were collected from four big newswire outlets, segmented in paragraphs, and marked up by a group of four annotators, who had to classify each paragraph according to two dimensions: target entity (that is the person which is the main subject of the news contained in the paragraph), and the paragraph's polarity with respect to the target entity. The corpus comprises 131 news, segmented in 1,447 paragraphs, with 65,675 words in total. Along with the corpus, we have also built a gold standard, where paragraphs are classified according to the opinion of the majority of annotators. This gold standard and annotated corpus are available to the community under a Creative Commons licence.

Publicado
04/11/2015
ARRUDA, Gabriel Domingos de; ROMAN, Norton Trevisan; MONTEIRO, Ana Maria. An Annotated Corpus for Sentiment Analysis in Political News. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 1. , 2015, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2015 . p. 101-110.