Análise de sentimentos de conteúdo compartilhado em comunidades brasileiras do Reddit: Avaliação de um conjunto de dados rotulados por humanos

  • Giovana Piorino UFMG
  • Vitor Moreira UFMG
  • Luiz Henrique Quevedo Lima UFMG
  • Adriana Silvina Pagano UFMG
  • Ana Paula Couto da Silva UFMG


The soaring use of social media and its impact on society have been raising ethical issues about the content disseminated by these platforms, particularly from the perspective of responsible AI given the need to mitigate the propagation of bias and the spread of toxic language. Sentiment Analysis of the language of these communities poses big challenges, since it requires quality datasets that can be used in supervised training of models. The social network Reddit comprises smaller, sub-communities centered on specific topics, called Subreddits. Through manual annotation of posts in Subreddits related to Brazilian content and communities, we have developed a dataset for Sentiment Analysis in Brazilian Portuguese. We report the results of our annotation process and characterize the language of the posts. Our dataset is meant to support Sentiment Analysis tasks for social media language in Brazilian Portuguese.

Palavras-chave: Análise de Sentimentos, Comunidades do Reddit, Tarefa de Anotação, Português Brasileiro


PIORINO, Giovana; MOREIRA, Vitor; LIMA, Luiz Henrique Quevedo; PAGANO, Adriana Silvina; SILVA, Ana Paula Couto da. Análise de sentimentos de conteúdo compartilhado em comunidades brasileiras do Reddit: Avaliação de um conjunto de dados rotulados por humanos. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 30. , 2024, Juiz de Fora/MG.

