Analyzing Data Bias through Data Augmentation: A Case Study in Financial Data

  • Julia dos Santos Porphirio UNIFESP
  • Diogo José dos Santos UNIFESP
  • Sérgio Azevedo Serasa Experian
  • Lilian Berton UNIFESP

Resumo


This study examines the impact of introducing group unbalances through oversampling strategies on model fairness and feature importance, especially concerning sensitive attributes like sex, marital status, and education in a financial dataset. We hypothesize that linear models are more vulnerable to fairness distortions introduced by oversampling generated by synthetic data generation than more complex models, such as gradient-boosted decision trees and support vector machines with an RBF kernel. To test this, we evaluate oversampling approaches like SMOTE and RandomOverSampler within a consistent framework, comparing linear classifiers against XGBoost and SVM (RBF). Our assessment includes predictive performance, the stability of fairness metrics, and changes in feature importance rankings before and after oversampling.

Referências

Agu, E. E., Abhulimen, A. O., Obiki-Osafiele, A. N., Osundare, O. S., Adeniran, I. A., and Efunniyi, C. P. (2024). Discussing ethical considerations and solutions for ensuring fairness in ai-driven financial services. International Journal of Frontier Research in Science, 3(2):001–009.

Bajracharya, A., Khakurel, U., Harvey, B., and Rawat, D. B. (2022). Recent advances in algorithmic biases and fairness in financial services: a survey. In Proceedings of the Future Technologies Conference, pages 809–822. Springer.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. 16:321–357.

Choi, Y., Hong, J., Lee, E., Kim, J., and Kim, S. (2025). Enhancing fairness in financial ai models through constraint-based bias mitigation. Journal of Information Processing Systems, 21(1):89–101.

Christensen, J. (2021). Ai in financial services. In Demystifying AI for the Enterprise, pages 149–192. Productivity Press.

de Castro Vieira, J. R., Barboza, F., Cajueiro, D., and Kimura, H. (2025). Towards fair ai: Mitigating bias in credit decisions—a systematic literature review. Journal of Risk and Financial Management, 18(5):228.

Hurlin, C., Pérignon, C., and Saurin, S. (2024). The fairness of credit scoring models. Management Science.

Kim, S., Lessmann, S., Andreeva, G., and Rovatsos, M. (2023). Fair models in credit: Intersectional discrimination and the amplification of inequity. arXiv preprint arXiv:2308.02680.

LemaÃŽtre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of machine learning research, 18(17):1–5.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35.

Porphirio, J. (2025). Credit risk fairness. [link]. Accessed: August 6, 2025.

Welfert, M., Stromberg, N., and Sankar, L. (2024). Fairness-enhancing data augmentation methods for worst-group accuracy. Proceedings of Machine Learning Research, 279:156–172.

Yeh, I.-C. (2009). Default of Credit Card Clients. UCI Machine Learning Repository.

Zhou, Y., Kantarcioglu, M., and Clifton, C. (2023). On improving fairness of ai models with synthetic minority oversampling techniques. In Proceedings of the 2023 SIAM international conference on data mining (SDM), pages 874–882. SIAM.
Publicado
29/09/2025
PORPHIRIO, Julia dos Santos; SANTOS, Diogo José dos; AZEVEDO, Sérgio; BERTON, Lilian. Analyzing Data Bias through Data Augmentation: A Case Study in Financial Data. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 950-961. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.14285.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>