Student dropout prediction: a comparative analysis of different representations of tranining in the learning of generic models

Abstract


In this work, different ways to represent the dropout behavior are evaluated in the development of generic models, aimed to predicting the dropout risk, in different semesters and courses, of face-to-face undergraduate students. From a careful pre-processing and the creation of distinct representations of training data, different machine learning models were built in order to evaluate which representation best contributes to the predictions performance. As a result, it was found that exemplifying the behavior of students in all semesters attended, in an accumulated and progressive way, benefited the learning of the predictive model, providing a accuracy of 80.1%.

Keywords: student dropout, educational data mining, predictive models

References

Baranyi, M., Nagy, M., and Molontay, R. (2020). Interpretable deep learning for university dropout prediction. In Proceedings of the 21st Annual Conference on Information Technology Education, SIGITE'20, page 13–19, New York, NY, USA. Association for Computing Machinery.

Brasil (1996). Diplomação, retenção e evasão nos cursos de graduação em instituições de ensino superior públicas. Technical report, Ministério da Educação, Comissão Especial de Estudos sobre a Evasão nas Universidades Públicas Brasileiras: ANDIFES; ABRUEM; SESu/MEC, Brasília, DF.

Böttcher, A., Thurner, V., Häfner, T., and Hertle, J. (2021). A data science-based approach for identifying counseling needs in first-year students. In 2021 IEEE Global Engineering Education Conference (EDUCON), pages 420–429.

Colpo, M. P., Primo, T. T., Pernas, A. M., and Cechinel, C. (2020). Mineração de Dados Educacionais na Previsão de Evasão: uma RSL sob a Perspectiva do Congresso Brasileiro de Informática na Educação. In Anais do XXXI Simpósio Brasileiro de Informática na Educação, pages 1102–1111, Porto Alegre, RS, Brasil. SBC.

Costa, A. G., Queiroga, E., Primo, T. T., Mattos, J. C. B., and Cechinel, C. (2020). Prediction analysis of student dropout in a computer science course using educational data mining. In 2020 XV Conferencia Latinoamericana de Tecnologias de Aprendizaje (LACLO), pages 1–6.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Waltham, MA, 3rd edition.

Kang, K. and Wang, S. (2018). Analyze and predict student dropout from online programs. In Proceedings of the 2nd International Conference on Compute and Data Analysis, ICCDA 2018, page 6–12, New York, NY, USA. Association for Computing Machinery.

Lee, S. and Chung, J. (2019). The machine learning-based dropout early warning system for improving the performance of dropout prediction. Applied Sciences (Switzerland), 9(15).

Lemaître, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.

McKinney, W. (2010). Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference, pages 56 – 61.

Mduma, N., Kalegele, K., and Machuve, D. (2019). A survey of machine learning approaches and techniques for student dropout prediction. Data Science Journal, 18:14:1–10.

Nagy, M. and Molontay, R. (2018). Predicting dropout in higher education based on secondary school performance. In 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), pages 389–394.

Ortigosa, A., Carro, R. M., Bravo-Agapito, J., Lizcano, D., Alcolea, J. J., and Blanco, O. (2019). From lab to production: Lessons learnt and real-life challenges of an early student-dropout prevention system. IEEE Transactions on Learning Technologies, 12(2):264–277.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Pontili, R., Staduto, J., and Henrique, J. (2018). Abandono e atraso escolar e sua relação com indicadores socioeconômicos: uma análise para a região sul do brasil. Gestão & Regionalidade, 34(101):4–22.

Santos, J. R. and Zaboroski, E. (2020). Ensino remoto e pandemia de COVID-19: Desafios e oportunidades de alunos e professores. Interacções, 16(55):41–57.

Silva Filho, R. L. L., Motejunas, P. R., Hipolito, O., and Lobo, M. B. C. M. (2007). A evasão no ensino superior brasileiro. Cadernos de Pesquisa, 37(132):641–659.

Solis, M., Moreira, T., Gonzalez, R., Fernandez, T., and Hernandez, M. (2018). Perspectives to predict dropout in university students with machine learning. In 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pages 1–6.

Yu, R., Lee, H., and Kizilcec, R. F. (2021). Should college dropout prediction models include protected attributes? In Proceedings of the Eighth ACM Conference on Learning @ Scale, L@S ’21, page 91–100, New York, NY, USA. Association for Computing Machinery.
Published
2021-11-22
COLPO, Miriam Pizzatto; PRIMO, Tiago Thompsen; AGUIAR, Marilton Sanchotene de. Student dropout prediction: a comparative analysis of different representations of tranining in the learning of generic models. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 32. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 873-884. DOI: https://doi.org/10.5753/sbie.2021.218517.