Análise de Modelos de Aprendizado de Máquina para a Predição do Desempenho de Alunos com Enfoque na Detecção de Viés Algorítmico

Matias Oliveira; Luciano de Souza Cabral; Rafael Ferreira Mello

doi:10.5753/sbie.2024.241546

Matias Oliveira Instituto Federal de Pernambuco http://orcid.org/0009-0008-3044-1807
Luciano de Souza Cabral Instituto Federal de Pernambuco / Universidade Federal de Alagoas https://orcid.org/0000-0002-4235-5753
Rafael Ferreira Mello Universidade Federal de Alagoas / C.E.S.A.R. Inovation Center / Universidade Federal Rural de Pernambuco https://orcid.org/0000-0003-3548-9670

DOI: https://doi.org/10.5753/sbie.2024.241546

Resumo

No atual panorama educacional, a disponibilidade abundante de dados tornou-se essencial. Pesquisas revelam que fatores como histórico escolar, comportamento e contexto socioeconômico estão diretamente ligados ao sucesso futuro dos alunos. Ao analisar esses dados, as instituições de ensino podem otimizar seus recursos, prevenindo a evasão escolar e promovendo uma alocação eficiente de recursos. Embora o uso de algoritmos de aprendizado de máquina (ML) tenha mostrado eficácia nesse contexto, surge o desafio do viés algorítmico, que pode marginalizar grupos sub-representados. Este estudo se propõe a comparar algoritmos usando frameworks de justiça algorítmica para quantificar e mitigar esse viés. Os resultados indicam que o algoritmo K-Nearest Neighbors se destaca por sua capacidade de prever o desempenho dos alunos de maneira justa, demonstrando alta acurácia global e baixo viés.

Palavras-chave: Aprendizado de Máquina, Predição de Desempenho de Alunos, Justiça Algorítmica, Análise de Dados Educacionais, Viés Algorítmico

Referências

Aleem, A. and Gore, M. M. (2020). Educational data mining methods: A survey. In 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), pages 182–188

Baig, M., Shaikh, S., Khatri, K., Shaikh, M., Khan, M. Z., and Rauf, M. (2023). Prediction of students performance level using integrated approach of ml algorithms. International Journal of Emerging Technologies in Learning (iJET), 18:216–234.

Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., and Biecek, P. (2021). dalex: Responsible machine learning with interactive explainability and fairness in python. Journal of Machine Learning Research, 22(214):1–7.

Barocas, S. and Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104:671. Available at SSRN: [link].

Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J. T., Saha, D., Sattigeri, P., Singh, M., Varshney, K. R., and Zhang, Y. (2018). AI fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. CoRR, abs/1810.01943.

Biau, G. and Scornet, E. (2016). A random forest guided tour. TEST, 25(2):197–227.

Burgos, C., Campanario, M. L., de la Peña, D., Lara, J. A., Lizcano, D., and Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers & Electrical Engineering, 66:541–556.

Farissi, A., Dahlan, H. M., and Samsuryadi (2020). Genetic algorithm based feature selection for predicting student’s academic performance. In Saeed, F., Mohammed, F., and Gazem, N., editors, Emerging Trends in Intelligent Computing and Informatics, pages 110–117, Cham. Springer International Publishing

Hu, Q. and Rangwala, H. (2020). Towards fair educational data mining: A case study on detecting at-risk students. In Educational Data Mining

Issah, I., Appiah, O., Appiahene, P., and Inusah, F. (2023). A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision Analytics Journal, 7:100204.

Khanna, L., Singh, S. N., and Alam, M. (2016). Educational data mining and its role in determining factors affecting students academic performance: A systematic review. In 2016 1st India International Conference on Information Processing (IICIP), pages 1–7.

Kramer, O. (2013). K-Nearest Neighbors, pages 13–23. Springer Berlin Heidelberg, Berlin, Heidelberg.

Li, L., Sha, L., Li, Y., Rakovic, M., Rong, J., Joksimovic, S., Selwyn, N., Gašević, D., and Chen, G. (2023). Moral machines or tyranny of the majority? a systematic review on predictive bias in education. In LAK23: 13th International Learning Analytics and Knowledge Conference, LAK2023, page 499–508, New York, NY, USA. Association for Computing Machinery.

Makhlouf, K., Zhioua, S., and Palamidessi, C. (2021). On the applicability of machine learning fairness notions. SIGKDD Explor. Newsl., 23(1):14–23.

Nabil, A., Seyam, M., and Abou-Elfetouh, A. (2021). Prediction of students’ academic performance based on courses’ grades using deep neural networks. IEEE Access, PP:1–1.

Nettleton, D. (2014). Chapter 9 - data modeling. In Nettleton, D., editor, Commercial Data Mining, pages 137–157. Morgan Kaufmann, Boston.

Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-Lopez, M., Flores-Albornoz, J., and Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80:3782–3785. SI:5 NANO 2021.

Pradhan, A. (2012). Support vector machine-a survey. IJETAE, 2.

Vasquez Verdugo, J., Gitiaux, X., Ortega, C., and Rangwala, H. (2022a). Faired: A systematic fairness analysis approach applied in a higher educational context. In LAK22: 12th International Learning Analytics and Knowledge Conference, LAK22, page 271–281, New York, NY, USA. Association for Computing Machinery.

Vasquez Verdugo, J., Gitiaux, X., Ortega, C., and Rangwala, H. (2022b). Faired: A systematic fairness analysis approach applied in a higher educational context. In LAK22: 12th International Learning Analytics and Knowledge Conference, LAK22, page 271–281, New York, NY, USA. Association for Computing Machinery.

Yan, S., Kao, H.-t., and Ferrara, E. (2020). Fair class balancing: Enhancing model fairness without observing sensitive attributes. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, page 1715–1724, New York, NY, USA. Association for Computing Machinery.

Zafar, M. B., Valera, I., Gomez Rodriguez, M., and Gummadi, K. P. (2017). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, page 1171–1180, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.