Analyzing College Student Dropout Risk Prediction in Real Data Using Walk-Forward Validation

  • Rodolfo Sanches Santos USP
  • Moacir Antonelli Ponti USP
  • Kamila Rios Rodrigues USP


College dropout is a concern for educational institutions since it directly impacts educational management and academic results, as well as being directly related to social problems. Therefore, there is significant incentive for studies that use data to support decisions by predicting risk of dropout so that institutions can attempt to prevent such cases. Although machine learning techniques were shown to have potential for this task, there are many steps involved when it comes to the use of real data, which comes from scattered systems and present issues such as need for data cleaning and preparation, high dimensionality of the data requiring adequate feature selection, as well as class imbalance. In this paper, we used data from 32.892 students enrolled between 2008 and 2020 from all courses offered by a public high-education institution. A protocol for data preparation is proposed and found to be more important than designing complex classifiers. We present guidelines when modelling a college dropout classification task using a public university data and experiments using Walk-Forward Validation that showed the predictive capacity for the first years.
SANTOS, Rodolfo Sanches; PONTI, Moacir Antonelli; RODRIGUES, Kamila Rios. Analyzing College Student Dropout Risk Prediction in Real Data Using Walk-Forward Validation. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 291-305. ISSN 2643-6264.