Interpretabilidade e Justiça Algorítmica: Avançando na Transparência de Modelos Preditivos de Evasão Escolar

Cássio S. Carvalho; Júlio C. B. Mattos; Marilton S. Aguiar

doi:10.5753/sbie.2024.242289

Cássio S. Carvalho Universidade Federal de Pelotas http://orcid.org/0009-0003-0207-9023
Júlio C. B. Mattos Universidade Federal de Pelotas https://orcid.org/0000-0002-0619-9271
Marilton S. Aguiar Universidade Federal de Pelotas https://orcid.org/0000-0002-5247-6022

DOI: https://doi.org/10.5753/sbie.2024.242289

Resumo

Com a onipresença da Inteligência Artificial (IA), surgem preocupações sobre a transparência dos modelos e a introdução de vieses. Este estudo examina a relação entre interpretabilidade e justiça algorítmica em modelos preditivos de evasão escolar precoce. É apresentada uma evolução do método de clusterização de explicações LIME, analisando resultados com justiça em atributos sensíveis como gênero, raça, cota e origem escolar. Os achados mostram que a métrica de interpretabilidade "agreement" pode se relacionar com a variação na justiça algorítmica, identificando regiões com desempenho e justiça variados. A análise ajuda a ajustar modelos de IA para melhorar a sua transparência em contextos educacionais.

Palavras-chave: Mineração de Dados Educacionais, Interpretabilidade, Explicabilidade, Justiça Algorítmica, Ética, Transparência, Justiça na IA

Referências

Adadi, A. and Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6:52138–52160.

Afrin, F., Hamilton, M., and Thevathyan, C. (2022). On the explanation of ai-based student success prediction. In Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V. V., Dongarra, J. J., and Sloot, P. M. A., editors, Computational Science – ICCS 2022, pages 252–258, Cham. Springer International Publishing.

Alamri, R. and Alharbi, B. (2021). Explainable student performance prediction models: A systematic review. IEEE Access, 9:33132–33143.

Alves, G., Bhargava, V., Couceiro, M., and Napoli, A. (2021). Making ml models fairer through explanations: The case of limeout. In van der Aalst, W. M. P., Batagelj, V., Ignatov, D. I., Khachay, M., Koltsova, O., Kutuzov, A., Kuznetsov, S. O., Lomazova, I. A., Loukachevitch, N., Napoli, A., Panchenko, A., Pardalos, P. M., Pelillo, M., Savchenko, A. V., and Tutubalina, E., editors, Analysis of Images, Social Networks and Texts, pages 3–18, Cham. Springer International Publishing.

Alwarthan, S., Aslam, N., and Khan, I. U. (2022). An explainable model for identifying at-risk student at higher education. IEEE Access, 10:107649 – 107668. All Open Access, Gold Open Access.

Araujo, I. (2021). Uma revisão sobre o uso de frameworks de interpretabilidade em aprendizado de máquina. In Anais do XIV Encontro Unificado de Computaçao do Piaui e XI Simposio de Sistemas de Informaçao, pages 105–112, Porto Alegre, RS, Brasil. SBC.

Baker, R. et al. (2010). Data mining for education. International encyclopedia of education, 7(3):112–118.

Baker, R. S. and Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1):3–17.

Bakhshinategh, B., Zaiane, O. R., ElAtia, S., and Ipperciel, D. (2018). Educational data mining applications and tasks: A survey of the last 10 years. Education and Information Technologies, 23:537–553.

Bhargava, V., Couceiro, M., and Napoli, A. (2020). Limeout: An ensemble approach to improve process fairness. In Koprinska, I., Kamp, M., Appice, A., Loglisci, C., Antonie, L., Zimmermann, A., Guidotti, R., Özgöbek, Ö., Ribeiro, R. P., Gavaldà, R., Gama, J., Adilova, L., Krishnamurthy, Y., Ferreira, P. M., Malerba, D., Medeiros, I., Ceci, M., Manco, G., Masciari, E., Ras, Z. W., Christen, P., Ntoutsi, E., Schubert, E., Zimek, A., Monreale, A., Biecek, P., Rinzivillo, S., Kille, B., Lommatzsch, A., and Gulla, J. A., editors, ECML PKDD 2020 Workshops, pages 475–491, Cham. Springer International Publishing.

Carvalho, C., Mattos, J., and Aguiar, M. (2023). Avaliação da interpretabilidade de modelos por meio da clusterização de explicações no contexto da predição de evasão no ensino superior. In Anais do XXXIV Simpósio Brasileiro de Informática na Educação, pages 1191–1201, Porto Alegre, RS, Brasil. SBC.

Carvalho, D. V., Pereira, E. M., and Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics. Electronics, 8(8).

Chou, T.-N. (2023). Apply an integrated responsible ai framework to sustain the assessment of learning effectiveness. volume 2, page 142 – 149. All Open Access, Hybrid Gold Open Access.

Colak Oz, H., Güven, Ç., and Nápoles, G. (2023). School dropout prediction and feature importance exploration in malawi using household panel data: machine learning approach. Journal of Computational Social Science, 6(1):245 – 287.

Dsilva, V., Schleiss, J., and Stober, S. (2023). Trustworthy academic risk prediction with explainable boosting machines. In Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O. C., and Dimitrova, V., editors, Artificial Intelligence in Education, pages 463–475, Cham. Springer Nature Switzerland.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, page 214–226, New York, NY, USA. Association for Computing Machinery.

Gardner, J., Brooks, C., and Baker, R. (2019). Evaluating the fairness of predictive student models through slicing analysis. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge, LAK19, page 225–234, New York, NY, USA. Association for Computing Machinery.

Grgic-Hlaca, N., Zafar, M. B., Gummadi, K. P., and Weller, A. (2016). The case for process fairness in learning: Feature selection for fair decision making. In NIPS symposium on machine learning and the law, page 11. Barcelona, Spain.

Hardt, M., Price, E., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.

Hegazi, M. O. and Abugroon, M. A. (2016). The state of the art on educational data mining in higher education. International Journal of Computer Trends and Technology, 31(1):46–56.

Hu, Q. and Rangwala, H. (2020). Towards fair educational data mining: A case study on detecting at-risk students. page 431 – 437.

Jeon, B., Shafran, E., Breitfeller, L., Levin, J., and Rosé, C. P. (2019). Time-series insights into the process of passing or failing online university courses using neural-induced interpretable student states. page 330 – 335.

Kantorski, G., Martins, R., Balejo, A., and Frick, M. (2023). Mineração de dados educacionais para predição da evasão em cursos de graduação presenciais no ensino superior. In Anais do XXXIV Simpósio Brasileiro de Informática na Educação, pages 1133–1142, Porto Alegre, RS, Brasil. SBC.

Kim, B., Khanna, R., and Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.

Kung, C. and Yu, R. (2020). Interpretable models do not compromise accuracy or fairness in predicting college success. In Proceedings of the Seventh ACM Conference on Learning @ Scale, L@S ’20, page 413–416, New York, NY, USA. Association for Computing Machinery.

Kusner, M. J., Loftus, J., Russell, C., and Silva, R. (2017). Counterfactual fairness. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.

Kuzilek, J., Hlosta, M., and Zdrahal, Z. (2017). Open university learning analytics dataset. Scientific Data, 4(1).

Le Quy, T., Nguyen, T. H., Friege, G., and Ntoutsi, E. (2023). Evaluation of group fairness measures in student performance prediction problems. In Koprinska, I., Mignone, P., Guidotti, R., Jaroszewicz, S., Fröning, H., Gullo, F., Ferreira, P. M., Roqueiro, D., Ceddia, G., Nowaczyk, S., Gama, J., Ribeiro, R., Gavaldà, R., Masciari, E., Ras, Z., Ritacco, E., Naretto, F., Theissler, A., Biecek, P., Verbeke, W., Schiele, G., Pernkopf, F., Blott, M., Bordino, I., Danesi, I. L., Ponti, G., Severini, L., Appice, A., Andresini, G., Medeiros, I., Graça, G., Cooper, L., Ghazaleh, N., Richiardi, J., Saldana, D., Sechidis, K., Canakoglu, A., Pido, S., Pinoli, P., Bifet, A., and Pashami, S., editors, Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 119–136, Cham. Springer Nature Switzerland.

Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S. (2021). Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1).

Matetic, M. (2019). Mining learning management system data using interpretable neural networks. page 1282 – 1287.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35.

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267:1–38.

Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2 edition.

Oliveira, R. d. S. and Medeiros, F. P. A. d. (2024). Modelo de predição de evasão escolar com base em dados de autoavaliação de cursos de graduação. Revista Brasileira de Informática na Educação, 32:1–21.

Pei, B. and Xing, W. (2022). An interpretable pipeline for identifying at-risk students. Journal of Educational Computing Research, 60(2):380–405.

Peña-Ayala, A. (2014). Educational Data Mining: Applications and Trends, volume 524. Springer International Publishing.

Qu, Y., Li, F., Li, L., Dou, X., and Wang, H. (2022). Can we predict student performance based on tabular and textual data? IEEE Access, 10:86008 – 86019.

Rachha, A. and Seyam, M. (2023). Explainable ai in education : Current trends, challenges, and opportunities. In SoutheastCon 2023, pages 232–239.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). “why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.

Romero, C. and Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6):601–618.

Romero, C. and Ventura, S. (2013). Data mining in education. WIREs Data Mining and Knowledge Discovery, 3(1):12–27.

Romero, C. and Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3):e1355.

Sahlaoui, H., Alaoui, E. A. A., Agoujil, S., and Nayyar, A. (2023). An empirical assessment of smote variants techniques and interpretation methods in improving the accuracy and the interpretability of student performance models. Education and Information Technologies, 29(5):5447–5483.

Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., and Müller, K.-R. (2019). Explainable AI: interpreting, explaining and visualizing deep learning, volume 11700. Springer Nature.

Saxena, N. A., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D. C., and Liu, Y. (2019). How do fairness definitions fare? examining public attitudes towards algorithmic definitions of fairness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 99–106.

Silva Filho, R. L. L. e., Motejunas, P. R., Hipólito, O., and Lobo, M. B. d. C. M. (2007). A evasão no ensino superior brasileiro. Cadernos de Pesquisa, 37(132):641–659.

Tsiakmaki, M. and Ragos, O. (2021). A case study of interpretable counterfactual explanations for the task of predicting student academic performance. page 120 – 125.

Vieira, C. and Digiampietri, L. (2022). Machine learning post-hoc interpretability: a systematic mapping study. In Anais do XVIII Simposio Brasileiro de Sistemas de Informaçao, Porto Alegre, RS, Brasil. SBC.

Xiang, F., Zhang, X., Cui, J., Carlin, M., and Song, Y. (2022). Algorithmic bias in a student success prediction models: Two case studies. In 2022 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), pages 310–315.

Xiao, W., Ji, P., and Hu, J. (2022). A survey on educational data mining methods used for predicting students’ performance. Engineering Reports, 4(5):e12482.