A Score approach to identify the risk of students dropout: an experiment with Information Systems Course

Resumo


Context: The student dropout in Higher Education contributes to much social, economic, and academic loss. Students have different reasons for dropping out but the main ones are related to difficulty in learning the content, the structure proposed by the course, and the lack of financial resources. Problem: Besides understanding the motive for students completely abandoned their studies, the most important problem is identifying which groups of students are at risk of dropping out. However, studies focus essentially on categorical indicators, i.e., binary results that denote whether a student is or is not in the risk group. This type of analysis is important, but, it does not present the variation in the student's performance during their academic life. Solution: Creating a score using machine learning techniques (KNN) can provide an instrument to measure how close the student is or not to the dropout group. Theory: We used Organizational knowledge creation to make available and expand knowledge about dropping and to provide inputs for the creation of a knowledge system. Method: The experimental study is quantitative and it was performed from the execution of KNN and its validation from statistical analyses. Results: With equation developed and accuracy of 87% with KNN, was possible to develop a drop out risk score with values between 0 and 1,000, where the closer to 0, the greater the probability of the student to drop out. Contributions and Impact in the IS area: The main contribution of the paper is to provide a new method to assist in the analysis of higher education dropout in Information System and other courses.
Palavras-chave: Dropout from Higher Education, Score, Machine Learning, KNN

Referências

David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37–66. https://doi.org/10.1007/BF00153759

Lo. Aulck, N. Velagapudi, J. Blumenstock, and J. West. 2016. Predicting student dropout in higher education. arXiv preprint arXiv:1606.06364 (2016).

Robinson Crusoé da Cruz, Renato Correa Juliano, Alinne Cristinne Correa Souza, and Francisco Carlos Monteiro Souza. 2022. Desenvolvimento de um Score para análise de risco de evasão de estudantes do Ensino Superior baseado em Aprendizado de Máquina. Anais do Computer on the Beach 13 (2022), 142–148.

H. da Silva and P. Adeodato. 2012. A data mining approach for preventing undergraduate students retention. In The 2012 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN.2012.6252437

Delsi Fries Davok and Rosilane Pontes Bernard. 2016. Avaliação dos índices de evasão nos cursos de graduação da Universidade do Estado de Santa Catarina - UDESC. (jul 2016).

Leonardo de Almeida Teodoro and Marco André Abud Kappel. 2020. Aplicação de Técnicas de Aprendizado de Máquina para Predição de Risco de Evasão Escolar em Instituições Públicas de Ensino Superior no Brasil. Revista Brasileira de Informática na Educação 28 (2020), 838–863.

Bruno Claudino Pereira de Brito, Rafael Ferreira Leite de Mello, and Gabriel Alves. 2020. Identificação de Atributos Relevantes na Evasão no Ensino Superior Público Brasileiro. In Anais do XXXI Simpósio Brasileiro de Informática na Educação. SBC, 1032–1041.

J. Júnior, R. Noronha, and C. Kaestner. 2017. Criação e Seleção de Atributos Aplicados na Previsão da Evasão de Curso em Alunos de Graduação. In Anais do Computer on the Beach. 61–70. https://doi.org/10.14210/cotb.v0n0.p061-070

M. Lanes and C. Alcântara. 2018. Predição de Alunos com Risco de Evasão: estudo de caso usando mineração de dados. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), Vol. 29. 1921.

Inderjeet Mani and I Zhang. 2003. kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, Vol. 126. ICML United States.

MEC. 2016. Altos índices de desistência na graduação revelam fragilidade do ensino médio, avalia ministro. [link].

André Menolli, Flávio Horita, José Jorge L Dias, and Ricardo Coelho. 2020. BI–based Methodology for Analyzing Higher Education: A Case Study of Dropout Phenomenon in Information Systems Courses. In XVI Brazilian Symposium on Information Systems. 1–8.

D Ramyachitra and P Manikandan. 2014. Imbalanced dataset classification and solutions: a review. International Journal of Computing and Business Research (IJCBR) 5, 4 (2014), 1–29.

Helena Sampaio. 2000. Ensino superior no Brasil: o setor privado. Cadernos de Pesquisa (2000), 213–213.

Daniel Victor Saraiva, Silas SL Pereira, Reinaldo B Braga, and Carina T de Oliveira. 2021. Análise de Agrupamentos para Caracterização de Indicadores de Evasão. In Anais do XXIX Workshop sobre Educação em Computação. SBC, 238–247.

Juliana Saraiva, Vanessa Dantas, and Amanda Rodrigues. 2019. Compreendendo a Evasão em uma Década no Curso Sistemas de Informação à luz de fatores humanos e sociais. In Anais do IV Workshop sobre Aspectos Sociais, Humanos e Econômicos de Software. SBC, 21–30.

Rüdiger Wirth and Jochen Hipp. 2000. CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, Vol. 1. Manchester, 29–40.

C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. 2012. Experimentation in Software Engineering: An Introduction (1st. ed.). Springer-Verlag Berlin Heidelberg.
Publicado
29/05/2023
CRUZ, Robinson Crusoé Da; JULIANO, Renato Correa; SOUZA, Francisco Carlos Monteiro; SOUZA, Alinne Cristinne Correa. A Score approach to identify the risk of students dropout: an experiment with Information Systems Course. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 19. , 2023, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 .