Improving the prediction of school dropout with the support of the semi-supervised learning approach

Authors

  • Eduardo Cardoso Melo Instituto Federal de Minas Gerais (IFMG)
  • Fernanda Sumika Hojo de Souza Universidade Federal de Ouro Preto (UFOP)

DOI:

https://doi.org/10.5753/isys.2023.2852

Keywords:

School Dropout, Machine Learning, Semi-supervised Learning, Educational Data Mining

Abstract

School dropout is a phenomenon characterized by being influenced by several variables. This research used Machine Learning techniques, especially in the context of the semi-supervised learning strategy, to predict the risk of dropout in undergraduate courses at a Brazilian higher education institution. Two phases of experiments were conducted, the first using Feature Selection techniques and the second applying a semi-supervised learning strategy to improve performance metrics collected from the increase in the number of instances of students labeled as Graduated. As a main result, we obtained a model capable of classifying dropout with 90% accuracy and 86% Macro-F1.

Downloads

Download data is not yet available.

References

Abdi, H. and Williams, L. J. (2010). Newman-Keuls test and Tukey test. In Encyclopedia of research design, 2, 1-11. [link]

Agrusti, F., Bonavolontà, G. and Mezzini, M. (2019). University Dropout Prediction through Educational Data Mining Techniques: A Systematic Review. In Journal of E-Learning and Knowledge Society, 15(3), 161-182. https://doi.org/10.20368/1971-8829/1135017

Andifes. (2019). V Pesquisa do Perfil Socioeconômico e Cultural dos Estudantes de Graduação das Instituições Federais de Ensino Superior Brasileiras. Associação Nacional dos Dirigentes das Instituições Federais de Ensino Superior (ANDIFES). Retrieved May 1, 2022. [link]

Assis, L. R. S. (2017). Perfil de evasão no ensino superior brasileiro: uma abordagem de mineração de dados. Retrieved April 22, 2022. [link]

Ayodele, T. O. (2010). Types of machine learning algorithms. In New advances in machine learning, 3, 19-48. https://doi.org/10.5772/9385

Baggi, C. A. S. and Lopes, D. A. (2011). Evasão e avaliação institucional no ensino superior: uma discussão bibliográfica. In Revista da Avaliação da Educação Superior (Campinas), 16(2), 355-374. https://doi.org/10.1590/S1414-40772011000200007

Balkis, M. (2018). Academic motivation and intention to school dropout: the mediation role of academic achievement and absenteeism. In Asia Pacific Journal of Education, 38(2), 257-270. https://doi.org/10.1080/02188791.2018.1460258

Batista, G. E. (2003). Pré-processamento de Dados em Aprendizado de Máquina supervisionado. Retrieved November 10, 2021. https://doi.org/10.11606/T.55.2003.tde-06102003-160219

Berka, P. and Marek, L. (2021). Bachelor’s degree student dropouts: Who tend to stay and who tend to leave?. In Studies in Educational Evaluation, 70. https://doi.org/10.1016/j.stueduc.2021.100999

Biazus, C. A. (2004). Sistema de fatores que influenciam o aluno a evadir se dos cursos de graduação na UFSM e na UFSC: um estudo no curso de ciências contábeis. Retrieved January 23, 2022. [link]

Brasil. (2007). Decreto Nº 6.096, de 24 de abril de 2007. Institui o Programa de Apoio a Planos de Reestruturação e Expansão das Universidades Federais - REUNI. Retrieved January 7, 2022. [link]

Castro, A. K. S. and Teixeira, M. A. P. (2013). A evasão em um curso de psicologia: uma análise qualitativa. In Psicologia em Estudo, 18(2), 199-209. [link]

Ceratti, M. R. N. (2008). Evasão escolar: causas e consequências. Retrieved February 12, 2022. [link]

Davok, D. F. and Bernard, R. P. (2016). Avaliação dos índices de evasão nos cursos de graduação da Universidade do Estado de Santa Catarina-UDESC. In Revista da Avaliação da Educação Superior, 21(2), 503-522. https://doi.org/10.1590/S1414-40772016000200010

Demeter, E., Dorodchi, M., Al-Hossami, E., Benedict, A., Walker, L. and Smail, J. (2022). Predicting first-time-in-college students’ degree completion outcomes. In Higher Education, 1-21. https://doi.org/10.1007/s10734-021-00790-9

Diniz, R. V. and Goergen, P. L. (2019). Educação Superior no Brasil: panorama da contemporaneidade. In Revista da Avaliação da Educação Superior, 24(3), 573-593. https://doi.org/10.1590/s1414-40772019000300002

Fialho, M. G. D. and Prestes, E. M. T. (2014). Evasão escolar no curso de pedagogia da UFPB: na compreensão dos gestores educacionais. In Mpgoa, 3(1), 42-63. [link]

Fisher, D. H., Pazzani, M. J. and Langley, P. (2014). Concept formation: Knowledge and experience in unsupervised learning. Morgan Kaufmann.

Flores, V., Heras, S. and Julian, V. (2022). Comparison of Predictive Models with Balanced Classes Using the SMOTE Method for the Forecast of Student Dropout in Higher Education. In Electronics, 11(3), 457. https://doi.org/10.3390/electronics11030457

Gibson, B. R., Rogers, T. T. and ZHU, X. (2013). Human semi‐supervised learning. In Topics in cognitive science, 5(1), 132-172. https://doi.org/10.1111/tops.12010

Gonçalves, T. C., Silva, J. C. and Cortes, O. A. C. (2018). Técnicas de mineração de dados: um estudo de caso da evasão no ensino superior do Instituto Federal do Maranhão. In Revista Brasileira de Computação Aplicada, 10(3), 11-20. https://doi.org/10.5335/rbca.v10i3.8427

Guo, J., Wang, Q. and Li, Y. (2021). Semi‐supervised learning based on convolutional neural network and uncertainty filter for façade defects classification. In Computer‐Aided Civil and Infrastructure Engineering, 36(3), 302-317. https://doi.org/10.1111/mice.12632

Hegde, V. and Prageeth, P. P. (2018). Higher education student dropout prediction and analysis through educational data mining. In 2nd International Conference on Inventive Systems and Control (ICISC), 694-699. https://doi.org/10.1109/ICISC.2018.8398887

Helm, J., Swiergosz, A., Haeberle, H., Karnuta, J., Schaffer, J., Krebs, V., Spitzer, A. and Ramkumar, P. (2020). Machine learning and artificial intelligence: definitions, applications, and future directions. In Current reviews in musculoskeletal medicine, 13(1), 69-76. https://doi.org/10.1007/s12178-020-09600-8

Hsu, H. H. and Hsieh, C. (2010). Feature Selection via Correlation Coefficient Clustering. In Journal of Software, 5(12), 1371-1377. https://doi.org/10.4304/jsw.5.12.1371-1377

Inep. (2019). Resumo técnico do Censo da Educação Superior 2019. Retrieved December 12, 2021. [link]

Jagodics, B. and Szabó, E. (2022). Student burnout in higher education: A demand-resource model approach. In Trends in Psychology, 1-20. https://doi.org/10.1007/s43076-021-00137-4

Jia, P. and Maloney, T. (2015). Using predictive modelling to identify students at risk of poor university outcomes. In Higher Education, 70(1), 127-149. https://doi.org/10.1007/s10734-014-9829-7

John, T. J., Walsh, M., Raczek, A., Vuilleumier, C., Foley, C., Heberle, A., Sibley, E. and Dearing, E. (2018). The long-term impact of systemic student support in elementary school: Reducing high school dropout. In AERA Open, 4(4). https://doi.org/10.1177/2332858418799085

Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. In Science, 349(6245), 255-260. https://doi.org/10.1126/science.aaa8415

José, A. R., Broilo, C. L. and Andreoli, G. S. A evasão na Unipampa – diagnosticando processos, acompanhando trajetórias e itinerários de formação. Retrieved January 20, 2022. [link]

Kantorski, G., Flores, E., Schmitt, J., Hoffmann, I. and Barbosa, F. (2016). Predição da evasão em cursos de graduação em instituições públicas. In Simpósio Brasileiro de Informática na Educação-SBIE, 27(1). http://dx.doi.org/10.5753/cbie.sbie.2016.906

Kehm, B. M., Larsen, M. R. and Sommersel, H. B. (2019). Student dropout from universities in Europe: A review of empirical literature. In Hungarian Educational Research Journal, 9(2), 147-164. https://doi.org/10.1556/063.9.2019.1.18

Koc, M., Zorbaz, O. and Demirtas-zorbaz, S. (2020). Has the ship sailed? The causes and consequences of school dropout from an ecological viewpoint. In Social Psychology of Education, 23(5), 1149-1171. https://doi.org/10.1007/s11218-020-09568-w

Kursa, M. B. and Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. In Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11

Lee, S. and Chung, J. Y. (2019). The machine learning-based dropout early warning system for improving the performance of dropout prediction. In Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093

Martins, C. B. N. (2007). Evasão de alunos nos cursos de graduação em uma instituição de ensino superior. Retrieved April 3, 2022. [link]

Melo, A. S. C. (2016). Previsão automática de evasão estudantil: um estudo de caso na UFCG. Retrieved January 15, 2022. [link]

Mitchell, T. (1997). Machine Learning. McGraw-Hill.

Momm, A. M. P. and Momm, S. F. (2020). A evasão escolar no curso superior de tecnologia em Jaraguá do Sul. Retrieved November 12, 2021. [link]

Monard, M. C. and Baranauskas, J. A. (2003). Conceitos sobre aprendizado de máquina. In Sistemas inteligentes - Fundamentos e aplicações, 1(1). [link]

Morais, J. I., Abonizio, H. Q., Tavares, G. M., da Fonseca, A. A., and Barbon Jr, S. (2020). A Multi-label Classification System to Distinguish among Fake, Satirical, Objective and Legitimate News in Brazilian Portuguese. In ISys - Brazilian Journal of Information Systems, 13(4), 126–149. https://doi.org/10.5753/isys.2020.833

Musso, M. F., Hernández, C. F. R. and Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: a machine-learning approach. In Higher Education, 80(5), 875-894. https://doi.org/10.1007/s10734-020-00520-7

Nagai, N. P. and Cardoso, A. L. J. (2017). A evasão universitária: Uma análise além dos números. In Revista Estudo & Debate, 24(1). http://dx.doi.org/10.22410/issn.1983-036X.v24i1a2017.1271

Neves, C. E. B. and Martins, C. B. (2016). Ensino superior no Brasil: uma visão abrangente. Retrieved November 1, 2021. [link]

Niaksu, O. (2015). CRISP data mining methodology extension for medical domain. In Baltic Journal of Modern Computing, 3(2), 92-109. [link]

Nicoletti, M. C. (2019). Revisiting the Tinto's Theoretical Dropout Model. In Higher Education Studies, 9(3), 52-64. [link]

Nonato, B. F., Nogueira, C., Lima, L. and Otoni, S. (2020). Mudanças no perfil dos estudantes da UFMG: desafios para a prática docente. In Revista Docência do Ensino Superior, 10, 1-21. https://doi.org/10.35699/2237-5864.2020.20463

Pascoe, M. C., Hetrick, S. E. and Parker, A. G. The impact of stress on students in secondary school and higher education. In International Journal of Adolescence and Youth, 25(1), 104-112. https://doi.org/10.1080/02673843.2019.1596823

Perez, B., Castellanos, C. and Correal, D. (2018). Applying data mining techniques to predict student dropout: a case study. In Colombian Conference on Applications in Computational Intelligence (ColCACI), 1-6. https://doi.org/10.1109/ColCACI.2018.8484847

Rumberger, R. W. (2020). The economics of high school dropouts. In The economics of education, 1, 149-158. https://doi.org/10.1016/B978-0-12-815391-8.00012-4

Shirasu, M. R. and Arraes, R. A. (2016). Determinantes da evasão e repetência escolar. 2016. Retrieved March 12, 2022. [link]

Silva Filho, R. L. L., Motejunas, P. R., Hipólito, O. and Lobo, M. B. C. (2007). A evasão no ensino superior brasileiro. In Caderno de Pesquisa, 37(132), 641-659. https://doi.org/10.1590/S0100-15742007000300007

Soares, L. C. C., Ronzani, R., Carvalho, R. and Silva, A. (2020). Aplicação de Técnicas de Aprendizado de Máquina em um Contexto Acadêmico com Foco na Identificação dos Alunos Evadidos e não Evadidos. In Humanidades & Inovação, 7(8), 223-235. [link]

Sousa, M. C. C. (2020). Uma análise do algoritmo K-means como introdução ao aprendizado de máquinas. Retrieved January 3, 2022. [link]

Stadler, M. J., Becker, N., Greiff, S. and Spinath, F. M. (2015). The complex route to success: complex problem-solving skills in the prediction of university success. In Higher Education Research & Development, 35, 1–15. https://doi.org/10.1080/07294360.2015.1087387

Teodoro, L. A. and Kappel, M. A. A. (2020). Aplicação de Técnicas de Aprendizado de Máquina para Predição de Risco de Evasão Escolar em Instituições Públicas de Ensino Superior no Brasil. In Revista Brasileira de Informática na Educação, 28, 838-863. http://dx.doi.org/10.5753/rbie.2020.28.0.838

Van Engelen, J. E. and Hoos, H. H. (2020). A survey on semi-supervised learning. In Mach Learn, 109, 373–440. https://doi.org/10.1007/s10994-019-05855-6

Wang, Z. and Taylor, M. E. (2017). Improving Reinforcement Learning with Confidence-Based Demonstrations. In International Joint Conference on Artificial Intelligence (IJCAI-17), 3027-3033. https://doi.org/10.24963/ijcai.2017/422

Wirth, R. and Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. 29-39. [link]

Zhu, X. and Goldberg, A. B. (2009). Introduction to Semi-Supervised Learning. In Synthesis lectures on artificial intelligence and machine learning, 3(1), 1-130. https://doi.org/10.2200/S00196ED1V01Y200906AIM006

Downloads

Published

2023-07-02

How to Cite

Cardoso Melo, E., & Sumika Hojo de Souza, F. (2023). Improving the prediction of school dropout with the support of the semi-supervised learning approach. ISys - Brazilian Journal of Information Systems, 16(1), 10:1–10:26. https://doi.org/10.5753/isys.2023.2852

Issue

Section

Regular articles