Comparativo de Algoritmos de Aprendizado de Máquina para a Classificação de Notícias sobre a Politec em Mato Grosso
Resumo
Este trabalho teve como objetivo a aplicação de cinco algoritmos de aprendizado de máquina para classificar e avaliar manchetes de notícias sobre a Politec de Mato Grosso. Para cada técnica utilizada foi feito um comparativo usando duas ferramentas de extração de características (BoW e TF-IDF) e três métodos de balanceamento de classes (Random Oversampling, SMOTE e SMOTE + Tomek Links). Os resultados obtidos mostram a eficiência dos métodos de balanceamento de classes e dentre as cinco técnicas de aprendizado de máquina, destaca-se a técnica de Multinomial Naive Bayes que obteve os melhores desempenhos de acurácia de manchetes em um conjunto de notícias que os modelos não tiveram acesso prévio.
Referências
Anitha, S. and Gnanasekaran, P. (2023) “Juncture of Text Preprocessing Techniques & Extracting Sentiment Analyzing of Micro-Blog Based on Machine LearningAlgorithms” In: International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)
Awwalu, J., Umar, N., Ibrahim, M. and Nonyelum, O. (2020) “A Multinomial Naive Bayes Decision Support System For Covid-19 Detection”, In: FUDMA Journal of Sciences, p. 704-711.
Jalal, N., Mehmood, A., Choi, G. and Ashraf, I. (2022) “A novel improved random forest for text classification using feature ranking and optimal number of trees”, In: Journal of King Saud University - Computer and Information Sciences, p. 2733-42.
Jariwala, G. Agarwal, H. and Jadhav, V. (2020). “Sentimental Analysis of News Headlines for Stock Market”, In: IEEE International Conference for Innovation inTechnology.
Maada, L., Fararni, K., Aghoutane, B., Fattah, M. and Farhaqui, Y. (2022) “A comparative study of Sentiment Analysis Machine Learning Approaches”, In: International Conference on Innovative Research in Applied Science.
Prasad, O., Nandi, S., Dogra, V. and Diwakar, D. (2023) “A systematic review of NLP methods for Sentiment classification of Online News Articles”, In: International Conference on Computing Communication and Networking Technology.
Silveira, M., Barbosa, N., Peixoto, A., Xavier E. and Júnior, S. (2021) “Application of logistic regression in the analysis of risk factor associated with arterial hypertension”, In: Research, Society and Development.
Yang, L. (2022) “A Brief Introduction of the Text Classification Methods”, In: IEEE International Conference on Electrical Engineering, Big Data an Algorithms.
Wongvorachan, T., He, S. and Bulut, O. (2023) “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining”, In: Multidisciplinary Digital Publishing Institute.