Comparative Analysis of Text Classification Algorithms
Resumo
This study presents a comparative analysis of the performance of the text classification task with Transformer-based models (BERT/BERTimbau) in contrast to traditional machine learning algorithms (Decision Tree, XGBoost, Naive Bayes, SVM, MLP) using two textual representations: dense embeddings and TF-IDF. The evaluation was conducted on 5 datasets, 3 binary and 2 multiclass, with texts in Portuguese and English. While Transformers consistently delivered the best overall performance, TF-IDF proved highly competitive–outperforming embeddings and even matching or surpassing BERT in specific cases.Referências
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford university press.
Braz Junior, O. and Fileto, R. (2021). Investigando coerência em postagens de um fórum de dúvidas em ambiente virtual de aprendizagem com o BERT. In Anais do XXXII Simpósio Brasileiro de Informática na Educação, pages 749–759, Porto Alegre, RS, Brasil. SBC.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 785–794.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273–297.
de Magalhães, L. H., Matos, F. F., and Souza, R. R. (2019). Comparação entre algoritmos de classificação aplicados na predição de notícias de jornais on-line. In XX Encontro Nacional de Pesquisa em Ciência da Informação, Florianópolis. ENANCIB.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3):37.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3(Mar):1289–1305.
Hassan, S. U., Ahamed, J., and Ahmad, K. (2022). Analytics of machine learning-based algorithms for text classification. Sustainable Operations and Computers, 3:238–248.
McCallum, A. and Nigam, K. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 752, pages 41–48.
Plath, H. O., Paiva, M. E. O., Pinto, D. L., and Costa, P. D. P. (2022). Detecção de discurso de Ódio contra mulheres em textos em português brasileiro: Construção da base MINA-BR e modelo de classificação. Revista Eletrônica de Iniciação Científica em Computação, 20(3).
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1):81–106.
Souto Moreira, L., Machado Lunardi, G., de Oliveira Ribeiro, M., Silva, W., and Paulo Basso, F. (2023). A study of algorithm-based detection of fake news in brazilian election: Is BERT the best. IEEE Latin America Transactions, 21(8):897–903.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Brazilian Conference on Intelligent Systems (BRACIS).
Souza, F. C. d. (2020). BERTimbau: Pretrained BERT models for brazilian portuguese. Master’s thesis, Universidade Estadual de Campinas.
Sun, C., Huang, L., and Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–385, Minneapolis, Minnesota. Association for Computational Linguistics.
Weiss, S. M., Indurkhya, N., and Zhang, T. (2015). Fundamentals of Predictive Text Mining. Springer International Publishing, London, second edition edition.
Braz Junior, O. and Fileto, R. (2021). Investigando coerência em postagens de um fórum de dúvidas em ambiente virtual de aprendizagem com o BERT. In Anais do XXXII Simpósio Brasileiro de Informática na Educação, pages 749–759, Porto Alegre, RS, Brasil. SBC.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 785–794.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273–297.
de Magalhães, L. H., Matos, F. F., and Souza, R. R. (2019). Comparação entre algoritmos de classificação aplicados na predição de notícias de jornais on-line. In XX Encontro Nacional de Pesquisa em Ciência da Informação, Florianópolis. ENANCIB.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3):37.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3(Mar):1289–1305.
Hassan, S. U., Ahamed, J., and Ahmad, K. (2022). Analytics of machine learning-based algorithms for text classification. Sustainable Operations and Computers, 3:238–248.
McCallum, A. and Nigam, K. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 752, pages 41–48.
Plath, H. O., Paiva, M. E. O., Pinto, D. L., and Costa, P. D. P. (2022). Detecção de discurso de Ódio contra mulheres em textos em português brasileiro: Construção da base MINA-BR e modelo de classificação. Revista Eletrônica de Iniciação Científica em Computação, 20(3).
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1):81–106.
Souto Moreira, L., Machado Lunardi, G., de Oliveira Ribeiro, M., Silva, W., and Paulo Basso, F. (2023). A study of algorithm-based detection of fake news in brazilian election: Is BERT the best. IEEE Latin America Transactions, 21(8):897–903.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Brazilian Conference on Intelligent Systems (BRACIS).
Souza, F. C. d. (2020). BERTimbau: Pretrained BERT models for brazilian portuguese. Master’s thesis, Universidade Estadual de Campinas.
Sun, C., Huang, L., and Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–385, Minneapolis, Minnesota. Association for Computational Linguistics.
Weiss, S. M., Indurkhya, N., and Zhang, T. (2015). Fundamentals of Predictive Text Mining. Springer International Publishing, London, second edition edition.
Publicado
29/09/2025
Como Citar
BORGES, Beatriz Ribeiro; FARIA, Elaine Ribeiro; GABRIEL, Paulo H. R..
Comparative Analysis of Text Classification Algorithms. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 189-200.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.12276.
