Predictive Modeling for Student Retention: Evaluation of Machine Learning Algorithms with Temporal Validation
Resumo
Este estudo tem como objetivo desenvolver e avaliar modelos preditivos para identificar estudantes com risco de evasão em um campus do Instituto Federal da Paraíba (IFPB), utilizando dados administrativos de 2017 a 2023 provenientes da Plataforma Nilo Peçanha. Foram aplicados diversos algoritmos de aprendizado de máquina supervisionado, incluindo modelos interpretáveis, como Regressão Logística, Árvore de Decisão e K-Nearest Neighbors (KNN), além de modelos mais complexos, como Support Vector Machine (SVM), Random Forest e XGBoost. A avaliação dos modelos foi realizada por meio de validação cruzada estratificada e de um teste prospectivo com dados inéditos de 2023. O modelo Random Forest apresentou o melhor desempenho geral, destacando-se em AUC-ROC e Recall, oferecendo um equilíbrio adequado entre generalização e sensibilidade. Os resultados demonstram a viabilidade de integrar modelos preditivos aos sistemas institucionais de apoio à decisão, fortalecendo as estratégias de permanência estudantil. Como trabalhos futuros, propõe-se implantar o modelo em um sistema de monitoramento e incorporar técnicas de detecção de concept drift, visando garantir a confiabilidade do modelo em ambientes educacionais dinâmicos.Referências
Albreiki, B., Zaki, N., and Alashwal, H. (2021). A systematic literature review of student’ performance prediction using machine learning techniques. Education Sciences, 11(9).
Alnasyan, K. and et al. (2024). Deep learning techniques for predicting student performance in virtual learning environments: A systematic review. IEEE Access.
Alturki, S., Hulpus, , I., and Stuckenschmidt, H. (2022). Predicting academic outcomes: A survey from 2007 till 2018. Technology, Knowledge and Learning.
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2):281–305.
Colpo, A., Silva, M. L., and Oliveira, R. (2024). Análise preditiva da evasão escolar no ensino técnico federal. Revista Brasileira de Informática na Educação, 32(1):120–138.
Colpo, G. and et al. (2024). Predicting student dropout using machine learning: A systematic review. Journal of Educational Data Science.
Estrada-Molina, O., Mena, J., and López-Padrón, A. (2024). The use of deep learning in open learning: A systematic review (2019 to 2023). International Review of Research in Open and Distributed Learning, 25(3).
Guleria, S. and Sood, M. (2014). Data mining in education: Review and future directions. International Journal of Computer Applications.
Hassan, M., Zhang, Y., and Liu, T. (2024). Predicting student dropout with random forest: A case study in technical education. In Proceedings of the 2024 International Conference on Artificial Intelligence in Education (ICAIED), pages 101–110.
Hegazi, M. O. and Abugroon, M. A. (2016). The state of the art on educational data mining in higher education. International Journal of Computer Trends and Technology (IJCTT), 31(1):46–55.
Khedr, A. E. and El Seddawy, A. I. (2015). Using random forests technique for early stage students’ performance prediction. International Journal of Computer Applications, 113(5):1–5.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on Artificial intelligence (IJCAI), volume 2, pages 1137–1143. Morgan Kaufmann.
Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4):331–344.
Krüger, C. F., Dias, J., and Souza, A. P. (2023). Aplicação de técnicas de mineração de dados para predição da evasão escolar. Revista de Estatística Aplicada, 19(2):45–67.
Lynn, N. D. and Emanuel, A. W. R. (2021). Using data mining techniques to predict students’ performance. a review. In IOP Conference Series: Materials Science and Engineering, volume 1096, page 012083. IOP Publishing.
Mduma, N., Kalegele, K., and Machuve, D. (2019). A survey of machine learning techniques for student dropout prediction. International Journal of Advanced Computer Science.
Romero, C. and Ventura, S. (2010). Educational data mining: A review of the state-ofthe-art. IEEE Transactions on Systems, Man, and Cybernetics, 40(6):601–618.
Shahiri, A., Husain, W., and Rashid, N. (2015). A review on predicting student’s performance using data mining techniques. Procedia Computer Science.
Alnasyan, K. and et al. (2024). Deep learning techniques for predicting student performance in virtual learning environments: A systematic review. IEEE Access.
Alturki, S., Hulpus, , I., and Stuckenschmidt, H. (2022). Predicting academic outcomes: A survey from 2007 till 2018. Technology, Knowledge and Learning.
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2):281–305.
Colpo, A., Silva, M. L., and Oliveira, R. (2024). Análise preditiva da evasão escolar no ensino técnico federal. Revista Brasileira de Informática na Educação, 32(1):120–138.
Colpo, G. and et al. (2024). Predicting student dropout using machine learning: A systematic review. Journal of Educational Data Science.
Estrada-Molina, O., Mena, J., and López-Padrón, A. (2024). The use of deep learning in open learning: A systematic review (2019 to 2023). International Review of Research in Open and Distributed Learning, 25(3).
Guleria, S. and Sood, M. (2014). Data mining in education: Review and future directions. International Journal of Computer Applications.
Hassan, M., Zhang, Y., and Liu, T. (2024). Predicting student dropout with random forest: A case study in technical education. In Proceedings of the 2024 International Conference on Artificial Intelligence in Education (ICAIED), pages 101–110.
Hegazi, M. O. and Abugroon, M. A. (2016). The state of the art on educational data mining in higher education. International Journal of Computer Trends and Technology (IJCTT), 31(1):46–55.
Khedr, A. E. and El Seddawy, A. I. (2015). Using random forests technique for early stage students’ performance prediction. International Journal of Computer Applications, 113(5):1–5.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on Artificial intelligence (IJCAI), volume 2, pages 1137–1143. Morgan Kaufmann.
Kotsiantis, S. B. (2012). Use of machine learning techniques for educational proposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4):331–344.
Krüger, C. F., Dias, J., and Souza, A. P. (2023). Aplicação de técnicas de mineração de dados para predição da evasão escolar. Revista de Estatística Aplicada, 19(2):45–67.
Lynn, N. D. and Emanuel, A. W. R. (2021). Using data mining techniques to predict students’ performance. a review. In IOP Conference Series: Materials Science and Engineering, volume 1096, page 012083. IOP Publishing.
Mduma, N., Kalegele, K., and Machuve, D. (2019). A survey of machine learning techniques for student dropout prediction. International Journal of Advanced Computer Science.
Romero, C. and Ventura, S. (2010). Educational data mining: A review of the state-ofthe-art. IEEE Transactions on Systems, Man, and Cybernetics, 40(6):601–618.
Shahiri, A., Husain, W., and Rashid, N. (2015). A review on predicting student’s performance using data mining techniques. Procedia Computer Science.
Publicado
24/11/2025
Como Citar
CABRAL, José Thiago Holanda de Alcântara.
Predictive Modeling for Student Retention: Evaluation of Machine Learning Algorithms with Temporal Validation. In: SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE), 36. , 2025, Curitiba/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 99-112.
DOI: https://doi.org/10.5753/sbie.2025.11981.
