Analysis of Machine Learning Models for Student Performance Prediction with a Focus on Algorithmic Bias Detection

Abstract


In the current educational landscape, the abundance of data has become crucial. Studies show that variables such as academic history, behavior, and socioeconomic context are correlated with students’ future performance. Educational institutions benefit from analyzing this data, preventing dropout rates and efficiently allocating resources. The use of machine learning algorithms (ML) has proven effective, but faces the challenge of algorithmic bias, which can harm underrepresented groups. This study compares algorithms using algorithmic fairness frameworks to measure bias. The results indicate that the K-Nearest Neighbors algorithm is effective in fairly predicting student performance, demonstrating high overall accuracy and low bias.
Keywords: Machine Learning, Student Performance Prediction, Algorithmic Fairness, Algorithmic Bias, Educational Data Analysis

References

Aleem, A. and Gore, M. M. (2020). Educational data mining methods: A survey. In 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), pages 182–188

Baig, M., Shaikh, S., Khatri, K., Shaikh, M., Khan, M. Z., and Rauf, M. (2023). Prediction of students performance level using integrated approach of ml algorithms. International Journal of Emerging Technologies in Learning (iJET), 18:216–234.

Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., and Biecek, P. (2021). dalex: Responsible machine learning with interactive explainability and fairness in python. Journal of Machine Learning Research, 22(214):1–7.

Barocas, S. and Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104:671. Available at SSRN: [link].

Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J. T., Saha, D., Sattigeri, P., Singh, M., Varshney, K. R., and Zhang, Y. (2018). AI fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. CoRR, abs/1810.01943.

Biau, G. and Scornet, E. (2016). A random forest guided tour. TEST, 25(2):197–227.

Burgos, C., Campanario, M. L., de la Peña, D., Lara, J. A., Lizcano, D., and Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers & Electrical Engineering, 66:541–556.

Farissi, A., Dahlan, H. M., and Samsuryadi (2020). Genetic algorithm based feature selection for predicting student’s academic performance. In Saeed, F., Mohammed, F., and Gazem, N., editors, Emerging Trends in Intelligent Computing and Informatics, pages 110–117, Cham. Springer International Publishing

Hu, Q. and Rangwala, H. (2020). Towards fair educational data mining: A case study on detecting at-risk students. In Educational Data Mining

Issah, I., Appiah, O., Appiahene, P., and Inusah, F. (2023). A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision Analytics Journal, 7:100204.

Khanna, L., Singh, S. N., and Alam, M. (2016). Educational data mining and its role in determining factors affecting students academic performance: A systematic review. In 2016 1st India International Conference on Information Processing (IICIP), pages 1–7.

Kramer, O. (2013). K-Nearest Neighbors, pages 13–23. Springer Berlin Heidelberg, Berlin, Heidelberg.

Li, L., Sha, L., Li, Y., Rakovic, M., Rong, J., Joksimovic, S., Selwyn, N., Gašević, D., and Chen, G. (2023). Moral machines or tyranny of the majority? a systematic review on predictive bias in education. In LAK23: 13th International Learning Analytics and Knowledge Conference, LAK2023, page 499–508, New York, NY, USA. Association for Computing Machinery.

Makhlouf, K., Zhioua, S., and Palamidessi, C. (2021). On the applicability of machine learning fairness notions. SIGKDD Explor. Newsl., 23(1):14–23.

Nabil, A., Seyam, M., and Abou-Elfetouh, A. (2021). Prediction of students’ academic performance based on courses’ grades using deep neural networks. IEEE Access, PP:1–1.

Nettleton, D. (2014). Chapter 9 - data modeling. In Nettleton, D., editor, Commercial Data Mining, pages 137–157. Morgan Kaufmann, Boston.

Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-Lopez, M., Flores-Albornoz, J., and Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80:3782–3785. SI:5 NANO 2021.

Pradhan, A. (2012). Support vector machine-a survey. IJETAE, 2.

Vasquez Verdugo, J., Gitiaux, X., Ortega, C., and Rangwala, H. (2022a). Faired: A systematic fairness analysis approach applied in a higher educational context. In LAK22: 12th International Learning Analytics and Knowledge Conference, LAK22, page 271–281, New York, NY, USA. Association for Computing Machinery.

Vasquez Verdugo, J., Gitiaux, X., Ortega, C., and Rangwala, H. (2022b). Faired: A systematic fairness analysis approach applied in a higher educational context. In LAK22: 12th International Learning Analytics and Knowledge Conference, LAK22, page 271–281, New York, NY, USA. Association for Computing Machinery.

Yan, S., Kao, H.-t., and Ferrara, E. (2020). Fair class balancing: Enhancing model fairness without observing sensitive attributes. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, page 1715–1724, New York, NY, USA. Association for Computing Machinery.

Zafar, M. B., Valera, I., Gomez Rodriguez, M., and Gummadi, K. P. (2017). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, page 1171–1180, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.
Published
2024-11-04
OLIVEIRA, Matias; CABRAL, Luciano de Souza; MELLO, Rafael Ferreira. Analysis of Machine Learning Models for Student Performance Prediction with a Focus on Algorithmic Bias Detection. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 35. , 2024, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 1442-1451. DOI: https://doi.org/10.5753/sbie.2024.241546.