Assessing the Role of Sensitive Attributes in Adversarial Debiasing

  • Diego Minatel USP
  • Antonio R. S. Parmezan USP
  • Vinicius M. A. Souza PUCPR
  • Solange O. Rezende USP

Resumo


Fairness in machine learning refers to the development of models that do not systematically disadvantage individuals or groups based on sensitive attributes. One commonly adopted principle is fairness through unawareness, which holds that models should not explicitly incorporate sensitive attributes into the decision-making process. However, in some scenarios, such as healthcare, variables like sex and age are essential for accurate diagnoses and prognoses. Previous studies have assessed the influence of including or excluding such information during model training. Nonetheless, they have not considered Adversarial Debiasing, a classification algorithm specifically designed to promote equitable results. To address this gap, we propose a comprehensive empirical analysis to investigate the role of sensitive attributes in this algorithm. We experimentally evaluated Adversarial Debiasing across 20 settings using 23 datasets from varying domains, one predictive performance metric, three group fairness metrics, and a non-parametric statistical test. Our findings indicate that classifiers trained without including sensitive information in the input feature set produce more precise and fairer outcomes.

Referências

Amrieh, E. A., Hamtini, T., and Aljarah, I. (2015). Preprocessing and analyzing educational data set using X-API for improving student’s performance. In AEECT, pages 1–5. IEEE.

Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT press.

Beutel, A., Chen, J., Zhao, Z., and Chi, E. H. (2017). Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075.

Caton, S. and Haas, C. (2024). Fairness in machine learning: A survey. ACM Comput. Surv., 56(7):166:1–166:38.

Corbett-Davies, S., Gaebler, J. D., Nilforoshan, H., Shroff, R., and Goel, S. (2023). The measure and mismeasure of fairness. J. Mach. Learn. Res., 24(1):14730–14846.

DSIT (2023). Capabilities and Risks from Frontier AI: A Discussion Paper on the Need for Further Research Into AI Risk. Department for Science, Innovation & Technology.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). Fairness through awareness. In ITCS, pages 214–226. ACM.

Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. (2015). Certifying and removing disparate impact. In KDD, pages 259–268. ACM.

Grgic-Hlaca, N., Zafar, M. B., Gummadi, K. P., and Weller, A. (2016). The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, volume 1, page 11. Curran Associates Inc.

Haeri, M. A. and Zweig, K. A. (2020). The crucial role of sensitive attributes in fair classification. In SSCI, pages 2993–3002. IEEE.

Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In NIPS, pages 3323–3331. Curran Associates Inc.

Kamiran, F. and Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst., 33(1):1–33.

Kamiran, F., Calders, T., and Pechenizkiy, M. (2010). Discrimination aware decision tree learning. In ICDM, pages 869–874. IEEE.

Kelly, M., Longjohn, R., and Nottingham, K. (2017). The UCI machine learning repository.

Larson, J., Mattu, S., Kirchner, L., and Angwin, J. (2016). How we analyzed the COMPAS recidivism algorithm. ProPublica.

Le Quy, T., Roy, A., Iosifidis, V., Zhang, W., and Ntoutsi, E. (2022). A survey on datasets for fairness-aware machine learning. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov., 12(3):e1452.

Mavrogiorgos, K., Kiourtis, A., Mavrogiorgou, A., Menychtas, A., and Kyriazis, D. (2024). Bias in machine learning: A literature review. Appl. Sci., 14(19):8860.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Comput. Surv., 54(6):1–35.

Minatel, D., da Silva, A. C. M., dos Santos, N. R., Cúri, M., Marcacini, R. M., and de Andrade Lopes, A. (2023a). Data stratification analysis on the propagation of discriminatory effects in binary classification. In KDMILE, pages 73–80. SBC.

Minatel, D., dos Santos, N. R., da Silva, A. C. M., Cúri, M., Marcacini, R. M., and de Andrade Lopes, A. (2025a). Influence of data stratification criteria on fairer classifications. J. Inf. Data Manag., 16(1):161–169.

Minatel, D., dos Santos, N. R., da Silva, A. C. M., Cúri, M., Marcacini, R. M., and Lopes, A. d. A. (2023b). Unfairness in machine learning for web systems applications. In WebMedia, pages 144–153. ACM.

Minatel, D., Parmezan, A. R., Cúri, M., and Lopes, A. D. A. (2023c). Fairness-aware model selection using differential item functioning. In ICMLA, pages 1971–1978. IEEE.

Minatel, D., Parmezan, A. R. S., Roque dos Santos, N., Cúri, M., and Lopes, A. (2025b). A dif-driven threshold tuning method for improving group fairness. In SAC, pages 890–898. ACM.

Van der Laan, P. (2001). The 2001 Census in the Netherlands: Integration of Registers and Surveys, pages 39–52.

Vanschoren, J., van Rijn, J. N., Bischl, B., and Torgo, L. (2014). OpenML: Networked science in machine learning. SIGKDD Explor. Newsl., 15(2):49–60.

Yang, J., Soltan, A. A., Eyre, D. W., Yang, Y., and Clifton, D. A. (2023). An adversarial training framework for mitigating algorithmic biases in clinical machine learning. npj Digit. Med., 6(1):55.

Zafar, M. B., Valera, I., Rogriguez, M. G., and Gummadi, K. P. (2017). Fairness constraints: Mechanisms for fair classification. In AISTATS, pages 962–970. PMLR.

Zhang, B. H., Lemoine, B., and Mitchell, M. (2018). Mitigating unwanted biases with adversarial learning. In AIES, pages 335–340. ACM.

Zhao, T., Dai, E., Shu, K., and Wang, S. (2022). Towards fair classifiers without sensitive attributes: Exploring biases in related features. In WSDM, pages 1433–1442. ACM.

Zheng, G., Jacobs, M. A., Braverman, V., and Parekh, V. S. (2025). Towards fair medical ai: Adversarial debiasing of 3d ct foundation embeddings. arXiv preprint arXiv:2502.04386.

Žliobaitė, I. and Custers, B. (2016). Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models. Artif. Intell. Law, 24:183–201.
Publicado
29/09/2025
MINATEL, Diego; PARMEZAN, Antonio R. S.; SOUZA, Vinicius M. A.; REZENDE, Solange O.. Assessing the Role of Sensitive Attributes in Adversarial Debiasing. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 368-378. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.12452.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>