Comparison of natural language processing techniques in social bot detection on Twitter during Brazilian presidential elections




Bot detection, Social networks, Twitter, Elections, Natural language processing, machine learning


Currently, there are thousands of social bots acting on different online social networks. Identifying them automatically is a computational challenge.
This work uses different natural language processing methods to extract features from tweets collected during the 2018 Brazilian presidential election period in order to make the bot detection process more precise. The developed solution uses artificial intelligence techniques, combining feature selection and classification algorithms.
The authors obtained the best results through a union of all the extracted features using the Random Forest classifier, achieving an precision of 0.86 for the bot class and AUC of 0.86.


Download data is not yet available.


Almeida, R. J. A. (2018). Leia - lexico para inferência adaptada. ˆ

Bessi, A. and Ferrara, E. (2016). Social bots distort the 2016 u.s. presidential election online discussion. First Monday, 21(11).

Cai, C., Li, L., and Zeng, D. (2017). Detecting social bots by jointly modeling deep behavior and content information. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, page 1995–1998, New York, NY, USA. Association for Computing Machinery.

Cambria, E. and White, B. (2014). Jumping nlp curves: A review of natural language processing research [review article]. IEEE Computational Intelligence Magazine, 9(2):48–57

Chen, S., Webb, G. I., Liu, L., and Ma, X. (2020). A novel selective na¨ıve bayes algorithm. Knowledge-Based Systems, 192:105361.

Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012). Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811–824

Dale, R., Moisl, H., and Somers, H. (2000). Handbook of Natural Language Processing. Taylor & Francis

Davis, C. A., Varol, O., Ferrara, E., Flammini, A., and Menczer, F. (2016). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web, WWW ’16 Companion, page 273–274, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee

de Morais, D. M. G. and Digiampietri, L. A. (2021). Methods and challenges in social bots detection: A systematic review. In XVII Brazilian Symposium on Information Systems, SBSI 2021, New York, NY, USA. Association for Computing Machinery.

Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.

Dickerson, J. P., Kagan, V., and Subrahmanian, V. S. (2014). Using sentiment to detect bots on twitter: Are humans more opinionated than bots? In Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’14, page 620–627, New York, New York, USA. IEEE Press.

Duki´c, D., Keča, D., and Stipi´c, D. (2020). Are you human? detecting bots on twitter using bert. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pages 631–636.

Esuli, A. and Sebastiani, F. (2006). SENTIWORDNET: A publicly available lexical resource for opinion mining. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), pages 417–422, Genoa, Italy. European Language Resources Association (ELRA).

Ferrara, E., Varol, O., Davis, C., Menczer, F., and Flammini, A. (2016). The rise of social bots. Commun. ACM, 59(7):96–104.

Ferreira, G. E., Santos, B. L., do ́O, M. T., Braz, R. R., and Digiampietri, L. A. (2021). Social bots detection in brazilian presidential elections using natural language processing. In XVII Brazilian Symposium on Information Systems, SBSI 2021, New York, NY, USA. Association for Computing Machinery.

Fonseca, E. R. and Rosa, J. L. G. (2013). Mac-morpho revisited: Towards robust part-of-speech tagging. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, Porto Alegre, Brazil. SBC.

Gilbert, C. H. E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14), Menlo Park, California, USA. Association for the Advancement of Artificial Intelligence.

Hall, M. A. (2000). Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, page 359–366, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Hartmann, N. S., Fonseca, E. R., Shulby, C. D., Treviso, M. V., Rodrigues, J. S., and Aluísio, S. M. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In Anais do XI Simpósio Brasileiro de Tecnologia da Informacão e da Linguagem Humana, pages 122–131, Porto Alegre, RS, Brasil. SBC.

Hurtado, S., Ray, P., and Marculescu, R. (2019). Bot detection in reddit political discussion. In Proceedings of the Fourth International Workshop on Social Sensing, Social Sense 19, page 30–35, New York, NY, USA. Association for Computing Machinery.

IBGE (2018). Instituto brasileiro de geografia e estatística. acesso à internet e à televisão e posse de telefone móvel celular para uso pessoal. In PNAD Contínua 2018. Available

Jurafsky, D. and Martin, J. H. (2009). Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, Upper Saddle River, N.J.

Leu, M. D. O., Morais, D. M. G., Xavier, F., and Digiampietri, L. A. (2019). Detecção automática de bots em redes sociais: um estudo de caso no segundo turno das eleições presidenciais brasileiras de 2018. In Revista de Sistemas de Informac¸ao da FSMA, page 31–39, Macae, Brazil. FSMA.

Meisel, W. S. (1990). Speech representation and speech understanding. In Proceedings of the Workshop on Speech and Natural Language, HLT ’90, page 423, USA. Association for Computational Linguistics.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In Bengio, Y. and LeCun, Y., editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Stroudsburg, USA. Association for Computational Linguistics

Mohammad, S., Khan, M. U. S., Ali, M., Liu, L., Shardlow, M., and Nawaz, R. (2019). Bot detection using a single post on social media. In 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4), pages 215–220, New York, New York, USA. IEEE Press

Santos, B. L., Ferreira, G. E., do O, M. T., Braz, R. R., and Digiampietri, L. A. (2020). Comparação de algorítmos para detecção de bots sociais nas eleições presidenciais no brasil em 2018 utilizando características do usuario. Revista Brasileira de Computação Aplicada, 13(1):53–64.

Soroush Vosoughi, Deb Roy, S. A. (2018). The spread of true and false news online. Science, 359(6380):1146–1151.

Sousa, R. C. C. d. (2016). Identificando sentimentos de texto em portugues com o sentiwordnet traduzido. Technical report, Universidade Federal do Ceara, Campus de Quixadá, Quixadá

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23

Souza, M., Vieira, R., Busetti, D., Chishman, R., and Alves, I. (2011). Construction of a portuguese opinion lexicon from multiple resources. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology, pages 59–66, Porto Alegre, RS, Brazil. SBC.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,

Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.




How to Cite

Lima Santos, B., Estavaringo Ferreira, G., Torres do Ó, M., Rodrigues Braz, R., & Antonio Digiampietri, L. (2022). Comparison of natural language processing techniques in social bot detection on Twitter during Brazilian presidential elections. ISys - Brazilian Journal of Information Systems, 15(1), 12:1–12:22.



Extended versions of selected articles