Proposal of a Method for Identifying Unfairness in Machine Learning Models based on Counterfactual Explanations

  • Fernanda R. P. Cirino Pontifícia Universidade Católica de Minas Gerais
  • Carlos D. Maia Pontifícia Universidade Católica de Minas Gerais
  • Marcelo S. Balbino Pontifícia Universidade Católica de Minas Gerais
  • Cristiane N. Nobre Pontifícia Universidade Católica de Minas Gerais


As machine learning models continue impacting diverse areas of society, the need to ensure fairness in decision-making becomes increasingly vital. Unfair outcomes resulting from biased data can have profound societal implications. This work proposes a method for identifying unfairness and mitigating biases in machine learning models based on counterfactual explanations. By analyzing the model’s equity implications after training, we provide insight into the potential of the method proposed to address equity issues. The findings of this study contribute to advancing the understanding of fairness assessment techniques, emphasizing the importance of post-training counterfactual approaches in ensuring fair decision-making processes in machine learning models.

Palavras-chave: Unfairness, Interpretability, Counterfactual Explanations, Machine Learning


Aggarwal, A., Lohia, P., Nagar, S., Dey, K., and Saha, D. Black box fairness testing of machine learning models. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019. Association for Computing Machinery, New York, NY, USA, pp. 625–635, 2019.

Balbino, M. d. S., Zárate, L. E. G., and Nobre, C. N. Csse - an agnostic method of counterfactual, selected, and social explanations for classification models. Expert Systems with Applications, 2023.

Chzhen, E., Denis, C., Hebiri, M., Oneto, L., and Pontil, M. Leveraging labeled and unlabeled data for consistent fair binary classification, 2020.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness, 2011.

Edor, J. John rawls’s concept of justice as fairness. PINISI Discretion Review vol. 4, pp. 179, 12, 2020.

Gomez, O., Holter, S., Yuan, J., and Bertini, E. Advice: Aggregated visual counterfactual explanations for machine learning model validation. 2021 IEEE Visualization Conference (VIS), 2021.

Guidotti, R., Monreale, A., Giannotti, F., Pedreschi, D., Ruggieri, S., and Turini, F. Factual and counter-factual explanations for black box decision making. IEEE Intelligent Systems 34 (6): 14–23, 2019.

Hardt, M., Price, E., and Srebro, N. Equality of opportunity in supervised learning, 2016.

Jain, A., Ravula, M., and Ghosh, J. Biased models have biased explanations, 2020.

Kim, M. P., Ghorbani, A., and Zou, J. Multiaccuracy: Black-box post-processing for fairness in classification, 2018.

Kusner, M. J., Loftus, J. R., Russell, C., and Silva, R. Counterfactual fairness, 2018.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. 54 (6), jul, 2021.

Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence vol. 267, pp. 1–38, 2019.

Oneto, L. and Chiappa, S. pp. 155–196. In L. Oneto, N. Navarin, A. Sperduti, e D. Anguita (Eds.), Fairness in Machine Learning. Springer International Publishing, Cham, pp. 155–196, 2020.

Petersen, F., Mukherjee, D., Sun, Y., and Yurochkin, M. Post-processing for individual fairness, 2021.

Saxena, N. A. Perceptions of fairness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’19. Association for Computing Machinery, New York, NY, USA, pp. 537–538, 2019.

Saxena, N. A., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D. C., and Liu, Y. How do fairness definitions fare? testing public attitudes towards three algorithmic definitions of fairness in loan allocations. Artificial Intelligence vol. 283, pp. 103238, 2020.

Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., and Wilson, J. The what-if tool: Interactive probing of machine learning models. IEEE Transactions on Visualization and Computer Graphics 26 (1): 56–65, 2020.
CIRINO, Fernanda R. P.; MAIA, Carlos D.; BALBINO, Marcelo S.; NOBRE, Cristiane N.. Proposal of a Method for Identifying Unfairness in Machine Learning Models based on Counterfactual Explanations. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 11. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 41-48. ISSN 2763-8944. DOI: