Social bots detection in Brazilian presidential elections using natural language processing

  • Gabriel Estavaringo Ferreira USP
  • Bianca Lima Santos USP
  • Marcelo Torres do Ó USP
  • Rafael Rodrigues Braz USP
  • Luciano Antonio Digiampietri USP


In recent years, we have seen an expressive increase in the number of users participating in social networks. Social networks, in general, have proven to be quite effective in spreading opinions and influencing people as messages can be shared with thousands of people in a few minutes. However, this ability has been exploited in a negative way, to manipulate opinions and spread misinformation and/or fake news. A common way of doing this is through the use of bots, computer algorithms that mimic human behavior, disseminating topics and news, demonstrating support or rejection to personalities, and interacting with other users, which can impact even democratic discussions. For this reason, the present work aims to show and compare approaches for detecting social bots using Twitter users posts data extracted during the Brazilian presidential election period of 2018. Using a dataset of Twitter users labeled as bots or humans, this research applies five natural language processing (NLP) techniques to extract characteristics from the content of the users messages on the social network. In order to analyze the impact of features extracted through NLP in the task of detecting bots, five different classifiers were tested including pre-processing techniques and feature selection. The best results were achieved through a union of all the extracted features using the Random Forest classifier, achieving an accuracy of 0.91 for the bot class and AUC of 0.83.
Palavras-chave: Bot detection, social networks, Twitter, elections, machine learning


