Predicting Student Dropout Rates at Higher Degree Using Machine Learning and Dimensionality Reduction
Abstract
This study proposes four indices for predicting student dropout in a computing course at a higher education institution, using machine learning (ML) and dimensionality reduction. Ten classification algorithms were applied to three datasets, including versions with and without the proposed indices. The best performance was achieved by QDA. SHAP analysis highlighted persistence and number of enrollments as the most relevant predictors.References
Abdi, H. M., Hassan, M. A., and Saralees, N. (2024). Predicting student dropout rates using supervised machine learning: Insights from the 2022 national education accessibility survey in somaliland. Applied Sciences, 14(17).
Alalawi, K., Athauda, R., and Chiong, R. (2023). Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review. Engineering Reports, 5.
Alyahyan, E. and Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. Inter. Jour. of Educational Technology in Higher Education, 17(1):3.
Fernández-García, A. J., Preciado, J. C., Melchor, F., Rodriguez-Echeverria, R., Conejero, J. M., and Sánchez-Figueroa, F. (2021). A real-life ml experience for predicting university dropout at different stages using academic data. IEEE Access, 9:133076–133090.
Freitas, F. A. d. S., Vasconcelos, F. F., Peixoto, S. A., Hassan, M. M., Dewan, M. A. A., Albuquerque, V. H. C. d., and Filho, P. P. R. (2020). Iot system for school dropout prediction using machine learning techniques based on socioeconomic data. Electronics, 9(10):1613.
Fuentes, N., Feliscuzo, L., and Sta Romana, C. L. (2024). Enhancing student retention in higher education: A fuzzy logic approach to prescriptive analytics. In 2024 IEEE 7th Inter. Conference on Big Data and Artificial Intelligence (BDAI), pages 41–48.
Fukao, A., Colanzi, T., Martimiano, L., and Feltrim, V. (2023). Study on evasion in computer science courses at the state university of maringá. In Proceedings of the 3rd Brazilian Symposium on Computing Education, pages 86–96, Porto Alegre, RS, Brazil. SBC.
INEP, N. I. f. E. S. and Teixeira., R. A. (2023). Higher education census 2022: Statistical notes. 2023.
Kim, S., Yoo, E., and Kim, S. (2023). Why do students drop out? university dropout prediction and associated factor analysis using machine learning techniques.
Krüger, J. G. C., Britto, A. S., and Barddal, J. P. (2023). An explainable machine learning approach for student dropout prediction. Expert Systems with Apps, 233:120933.
Naseem, M., Chaudhary, K., Sharma, B., and Lal, A. G. (2019). Using ensemble decision tree model to predict student dropout in computing science. In 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), pages 1–8.
Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., and Nshimyumukiza, P. C. (2022). Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence, 3:100066.
Nurmalitasari, N., awang long, Z., and Mohd Noor, F. (2023). Factors influencing dropout students in higher education. Education Research Inter., 2023:1–13.
Prasanth, A. and Alqahtani, H. (2023). Predictive modeling of student behavior for early dropout detection in universities using machine learning techniques. In 2023 IEEE 8th Int’l Conference on Engineering Technologies and Applied Sciences, pages 1–5.
Rabelo, A. M. and Zárate, L. E. (2024). A model for predicting dropout of higher education students. Data Science and Management.
Salih, A. M., Raisi-Estabragh, Z., Galazzo, I. B., Radeva, P., Petersen, S. E., Lekadir, K., and Menegaz, G. (2024). A perspective on explainable artificial intelligence methods: Shap and lime. Advanced Intelligent Systems.
Shohag, S. I. and Bakaul, M. (2021). A machine learning approach to detect student dropout at univ. Int’l Journal Advanced Trends in Computer Science and Engineering.
Singh, H. P. and Alhulail, H. N. (2022). Predicting student-teachers dropout risk and early identification: A four-step logistic regression approach. IEEE Access, 10:6470–6482.
Theodorsson-Norheim, E. (1987). Friedman and quade tests: Basic computer program to perform nonparametric two-way analysis of variance and multiple comparisons on ranks of several related samples. Computers in Biology and Medicine, 17(2):85–99.
Wong, T.-T. and Yeh, P.-Y. (2020). Reliable accuracy estimates from k-fold cross validation. IEEE Transactions Knowledge and Data Engineering, 32(8):1586–1594.
Alalawi, K., Athauda, R., and Chiong, R. (2023). Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review. Engineering Reports, 5.
Alyahyan, E. and Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. Inter. Jour. of Educational Technology in Higher Education, 17(1):3.
Fernández-García, A. J., Preciado, J. C., Melchor, F., Rodriguez-Echeverria, R., Conejero, J. M., and Sánchez-Figueroa, F. (2021). A real-life ml experience for predicting university dropout at different stages using academic data. IEEE Access, 9:133076–133090.
Freitas, F. A. d. S., Vasconcelos, F. F., Peixoto, S. A., Hassan, M. M., Dewan, M. A. A., Albuquerque, V. H. C. d., and Filho, P. P. R. (2020). Iot system for school dropout prediction using machine learning techniques based on socioeconomic data. Electronics, 9(10):1613.
Fuentes, N., Feliscuzo, L., and Sta Romana, C. L. (2024). Enhancing student retention in higher education: A fuzzy logic approach to prescriptive analytics. In 2024 IEEE 7th Inter. Conference on Big Data and Artificial Intelligence (BDAI), pages 41–48.
Fukao, A., Colanzi, T., Martimiano, L., and Feltrim, V. (2023). Study on evasion in computer science courses at the state university of maringá. In Proceedings of the 3rd Brazilian Symposium on Computing Education, pages 86–96, Porto Alegre, RS, Brazil. SBC.
INEP, N. I. f. E. S. and Teixeira., R. A. (2023). Higher education census 2022: Statistical notes. 2023.
Kim, S., Yoo, E., and Kim, S. (2023). Why do students drop out? university dropout prediction and associated factor analysis using machine learning techniques.
Krüger, J. G. C., Britto, A. S., and Barddal, J. P. (2023). An explainable machine learning approach for student dropout prediction. Expert Systems with Apps, 233:120933.
Naseem, M., Chaudhary, K., Sharma, B., and Lal, A. G. (2019). Using ensemble decision tree model to predict student dropout in computing science. In 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), pages 1–8.
Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., and Nshimyumukiza, P. C. (2022). Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence, 3:100066.
Nurmalitasari, N., awang long, Z., and Mohd Noor, F. (2023). Factors influencing dropout students in higher education. Education Research Inter., 2023:1–13.
Prasanth, A. and Alqahtani, H. (2023). Predictive modeling of student behavior for early dropout detection in universities using machine learning techniques. In 2023 IEEE 8th Int’l Conference on Engineering Technologies and Applied Sciences, pages 1–5.
Rabelo, A. M. and Zárate, L. E. (2024). A model for predicting dropout of higher education students. Data Science and Management.
Salih, A. M., Raisi-Estabragh, Z., Galazzo, I. B., Radeva, P., Petersen, S. E., Lekadir, K., and Menegaz, G. (2024). A perspective on explainable artificial intelligence methods: Shap and lime. Advanced Intelligent Systems.
Shohag, S. I. and Bakaul, M. (2021). A machine learning approach to detect student dropout at univ. Int’l Journal Advanced Trends in Computer Science and Engineering.
Singh, H. P. and Alhulail, H. N. (2022). Predicting student-teachers dropout risk and early identification: A four-step logistic regression approach. IEEE Access, 10:6470–6482.
Theodorsson-Norheim, E. (1987). Friedman and quade tests: Basic computer program to perform nonparametric two-way analysis of variance and multiple comparisons on ranks of several related samples. Computers in Biology and Medicine, 17(2):85–99.
Wong, T.-T. and Yeh, P.-Y. (2020). Reliable accuracy estimates from k-fold cross validation. IEEE Transactions Knowledge and Data Engineering, 32(8):1586–1594.
Published
2025-09-29
How to Cite
BEZERRA, Wanessa S.; VALE, Karliane M. O.; GORGÔNIO, Flavius L.; GUERRA, Fabrício V. A.; GORGÔNIO, Arthur C.; CANUTO, Anne M. P..
Predicting Student Dropout Rates at Higher Degree Using Machine Learning and Dimensionality Reduction. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 1854-1865.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.14119.
