Strategies Selection for a Fair Classification in Logistic Regression: A Comparative Analysis

  • Murilo V. Pinheiro Universidade Federal do Ceará (UFC)
  • Maria de Lourdes M. Silva Universidade Federal do Ceará (UFC)
  • Javam C. Machado Universidade Federal do Ceará (UFC)

Resumo


The increasing use of technology leads society to a new concern: the use of machine learning models on personal data and the potentially biased classification. A new definition, called fairness, emerged to mitigate and combat discrimination in algorithms. Fairness literature includes several techniques to guarantee fair outputs for different demographic groups. There are three phases where an algorithm can achieve fairness. We explore some methods of each stage to construct a comparative analysis that evaluates fairness and utility metrics. Our analysis aims to understand the many ways to achieve fairness using logistic regression in the three most popular datasets in fairness literature. We include several experiments to compare five fairness techniques and select the best for each application.

Palavras-chave: Experiments and analyses, Machine learning, AI, Data management and data systems, Responsible data management and algorithmic fairness

Referências

Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. S. (2011). Fairness through awareness. CoRR, abs/1104.3913.

Fabris, A., Messina, S., Silvello, G., and Susto, G. A. (2022). Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery, 36(6):2074–2152.

Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. (2015). Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259–268.

Government, U. (2015). Equality act 2010: What do i need to know? quick start guide to discrimination by association and perception for voluntary and community oranisations. [link]. Last accessed: May 23, 2023.

Hofmann, H. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NC77.

Kamiran, F. and Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and information systems, 33(1):1–33.

Kamiran, F., Karim, A., and Zhang, X. (2012). Decision theory for discrimination-aware classification. In 2012 IEEE 12th international conference on data mining, pages 924–929. IEEE.

Kamishima, T., Akaho, S., Asoh, H., and Sakuma, J. (2012). Fairness-aware classifier with prejudice remover regularizer. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23, pages 35–50. Springer.

Larson, J., Roswell, M., and Atlidakis, V. (2016). Compas. [link]. July 29, 2022.

Narasimhan, H. (2018). Learning with complex loss functions and constraints. In International Conference on Artificial Intelligence and Statistics, pages 1646–1654. PMLR.

Pitoura, E., Stefanidis, K., and Koutrika, G. (2021). Fairness in rankings and recommenders: Models, methods and research directions. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 2358–2361. IEEE.
Publicado
25/09/2023
Como Citar

Selecione um Formato
V. PINHEIRO, Murilo; M. SILVA, Maria de Lourdes; MACHADO, Javam C.. Strategies Selection for a Fair Classification in Logistic Regression: A Comparative Analysis. In: WORKSHOP DE TRABALHOS DE ALUNOS DA GRADUAÇÃO (WTAG) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 15-21. DOI: https://doi.org/10.5753/sbbd_estendido.2023.232722.