Evaluation of Fairness in Machine Learning Models using the UCI Adult Dataset

Lucas Sena; Javam Machado

doi:10.5753/sbbd.2024.243650

Lucas Sena Universidade Federal do Ceará (UFC)
Javam Machado Universidade Federal do Ceará (UFC)

DOI: https://doi.org/10.5753/sbbd.2024.243650

Resumo

This paper presents a comprehensive analysis of fairness in machine learning models using the UCI Adult Dataset. The study focuses on mitigating biases related to sensitive attributes such as race and gender by reducing the dimensionality of the dataset. We evaluated the performance and fairness of three popular machine learning models—Logistic Regression, Random Forest, and Gradient Boosting—both with and without including sensitive features. The results indicate that while performance metrics remain stable, the fairness metrics reveal significant insights, underscoring the necessity of considering fairness alongside performance in machine learning applications.

Palavras-chave: Fairness in Machine Learning, Bias Mitigation, UCI Adult Dataset, Logistic Regression, Random Forest, Gradient Boosting, Sensitive Attributes, Fairness Metrics, Machine Learning Performance, Ethical AI, Bias in AI Models

Referências

Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT press.

Caton, S. and Haas, C. (2024). Fairness in machine learning: A survey. ACM Computing Surveys, 56(7):1–38.

Chaves, I. C., Martins, A. D. F., Praciano, F. D., Brito, F. T., Monteiro, J. M., and Machado, J. C. (2022). Bpa: A multilingual sentiment analysis approach based on bilstm. In ICEIS (1), pages 553–560.

Dhar, P., Gleason, J., Roy, A., Castillo, C. D., and Chellappa, R. (2021). Pass: protected attribute suppression system for mitigating bias in face recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15087–15096.

Girhepuje, S. (2023). Identifying and examining machine learning biases on adult dataset. arXiv preprint arXiv:2310.09373.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35.

Sena, L. B., Praciano, F. D., Chaves, I. C., Brito, F. T., Neto, E. R. D., Monteiro, J. M., and Machado, J. C. (2022). Audio-mc: A general framework for multi-context audio classification. In ICEIS (1), pages 374–383.

Stoyanovich, J., Howe, B., and Jagadish, H. V. (2020). Responsible data management. Proceedings of the VLDB Endowment, 13(12).

Žliobaitė, I. (2017). Measuring discrimination in algorithmic decision making. Data Mining and Knowledge Discovery, 31(4):1060–1089.