Analysis of the main factors influencing dropout in higher education using educational data mining techniques

  • Ronaldo Celso Messias Correia UNESP
  • Harrison Buziquia de Mendonça UNESP
  • Camila Tolin Santos Da Silva UNESP
  • Douglas Francisquini Toledo UNESP

Abstract


Dropout rates persist in Brazilian higher education, despite the increase in enrollments. This study proposes an innovative methodology to detect students at risk of dropping out at UNIVESP, using educational data mining and analysis. The technique involves pre-processing, feature selection and machine learning to identify evasion patterns. Anticipating these cases, preventive strategies and personalized support can promote student success. This methodology identifies the main dropout factors, providing insights for educational policies. Preliminary results show 92% accuracy in identifying students at risk, with less than 20% of the data characteristics, demonstrating its effectiveness.

References

Agresti, A. (2012). Categorical data analysis, volume 792. John Wiley & Sons.

Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish journal of emergency medicine, 18(3):91–93.

Belenke dos Santos, J. C. (2021). Usando mineração de dados para predição da evasão escolar.

Bittencourt, H. R. (2003). Regressão logística politômica: revisão teórica e aplicações. Acta Scientiae, 5(1):77–86.

BRASIL. Instituto Nacional de Estudos e Pesquisas Educacionais (Inep). Censo da Educação Superior. Brasília, DF (2023). INESP. [link].

Chen, T. and Guestrin, C. (2016a). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.

Chen, T. and Guestrin, C. (2016b). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.

Dong, G. and Liu, H. (2018). Feature engineering for machine learning and data analytics. CRC press.

e Cleber Alcântara, M. L. (2018). Predição de alunos com risco de evasão: estudo de caso usando mineração de dados. Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação - SBIE), 29(1):1921.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.

Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. ”O’Reilly Media, Inc.”.

Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.

Liu, Y., Wang, Y., and Zhang, J. (2012). New machine learning algorithm: Random forest. In Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3, pages 246–252. Springer.

Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9(1):381–386.

McKinney, W. et al. (2010). Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, volume 445, pages 51–56. Austin, TX.

Monard, M. C. and Baranauskas, J. A. (2003). Conceitos sobre aprendizado de máquina. Sistemas inteligentes-Fundamentos e aplicações, 1(1):32.

Oliveira, L. R. and Costa, S. R. (2021). Fatores que contribuem para a evasão escolar em cursos de nível superior. Revista Espacios, 42(11).

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Powell, S. (2018). The book of why: The new science of cause and effect. pearl, judea, and dana mackenzie. 2018. hachette uk. Journal of MultiDisciplinary Evaluation, 14(31):47–54.

Romero, C., Romero, J. R., and Ventura, S. (2014). A survey on pre-processing educational data. Educational data mining: applications and trends, pages 29–64.

Schmitt, J. A. et al. (2018). Identificação de alunos com tendência à evasão nos cursos de graduação à distância por meio de mineração de dados educacionais.
Published
2024-07-21
CORREIA, Ronaldo Celso Messias; MENDONÇA, Harrison Buziquia de; SILVA, Camila Tolin Santos Da; TOLEDO, Douglas Francisquini. Analysis of the main factors influencing dropout in higher education using educational data mining techniques. In: WORKSHOP ON COMPUTING EDUCATION (WEI), 32. , 2024, Brasília/DF. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 830-841. ISSN 2595-6175. DOI: https://doi.org/10.5753/wei.2024.3105.