Analyzing the Trade-off Between Fairness and Model Performance in Supervised Learning: A Case Study in the MIMIC dataset

Bruno Pires M. Silva; Lilian Berton

doi:10.5753/sbcas.2025.6994

Bruno Pires M. Silva UNIFESP
Lilian Berton UNIFESP

DOI: https://doi.org/10.5753/sbcas.2025.6994

Abstract

Fairness has become a key area in machine learning (ML), aiming to ensure equitable outcomes across demographic groups and mitigate biases. This study examines fairness in healthcare using the MIMIC III dataset, comparing traditional and fair ML approaches in pre, in, and post-processing stages. Methods include Correlation Remover and Adversarial Learning from Fairlearn, and Equalized Odds Post-processing from AI Fairness 360. We evaluate performance (accuracy, F1-score) alongside fairness metrics (equal opportunity, equalized odds) considering different sensible attributes. Notably, Equalized Odds Post-processing improved fairness with less performance loss, highlighting the trade-off between fairness and predictive accuracy in healthcare models.

References

Ay, S., Cardei, M., Meyer, A.-M., Zhang, W., and Topaloglu, U. (2024). Improving equity in deep learning medical applications with the gerchberg-saxton algorithm. Journal of Healthcare Informatics Research, 8(2):225–243.

Barocas, S., Hardt, M., and Narayanan, A. (2018). Fairness and machine learning. fairml-book. org, 2019.

Bellamy, R. K., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilović, A., et al. (2019). Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5):4–1.

Bird, S., Dudík, M., Edgar, R., Horn, B., Lutz, R., Milan, V., Sameki, M., Wallach, H., and Walker, K. (2020). Fairlearn: A toolkit for assessing and improving fairness in ai. Microsoft, Tech. Rep. MSR-TR-2020-32.

Chen, J., Berlot-Attwell, I., Hossain, S., Wang, X., and Rudzicz, F. (2020). Exploring text specific and blackbox fairness algorithms in multimodal clinical nlp. arXiv preprint arXiv:2011.09625.

Correa, R., Shaan, M., Trivedi, H., Patel, B., Celi, L. A. G., Gichoya, J. W., and Banerjee, I. (2022). A systematic review of ‘fair’ai model development for image classification and prediction. Journal of Medical and Biological Engineering, 42(6):816–827.

Ferrara, C., Sellitto, G., Ferrucci, F., Palomba, F., and De Lucia, A. (2024). Fairness-aware machine learning engineering: how far are we? Empirical software engineering, 29(1):9.

Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L.-w. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., and Mark, R. G. (2016). Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.

Jui, T. D. and Rivas, P. (2024). Fairness issues, current approaches, and challenges in machine learning models. International Journal of Machine Learning and Cybernetics, pages 1–31.

Kakadiaris, A. (2023). Evaluating the fairness of the mimic-iv dataset and a base-line algorithm: Application to the icu length of stay prediction. arXiv preprint arXiv:2401.00902.

Kurbatskaya, A., Jaramillo-Jimenez, A., Ochoa-Gomez, J. F., Brønnick, K., and Fernandez-Quilez, A. (2023). Assessing gender fairness in eeg-based machine learning detection of parkinson’s disease: A multi-center study. In 2023 31st European Signal Processing Conference (EUSIPCO), pages 1020–1024. IEEE.

Li, J. and Li, G. (2025). Triangular trade-off between robustness, accuracy, and fairness in deep neural networks: A survey. ACM Comput. Surv., 57(6).

Luo, Y., Tian, Y., Shi, M., Pasquale, L. R., Shen, L. Q., Zebardast, N., Elze, T., and Wang, M. (2024). Harvard glaucoma fairness: a retinal nerve disease dataset for fairness learning and fair identity normalization. IEEE Transactions on Medical Imaging.

Malone, B., Garcia-Duran, A., and Niepert, M. (2018). Learning representations of missing data for predicting patient outcomes. arXiv preprint arXiv:1811.04752.

Meng, C., Trinh, L., Xu, N., Enouen, J., and Liu, Y. (2022). Interpretability and fairness evaluation of deep learning models on mimic-iv dataset. Scientific Reports, 12(1):7166.

Oliveira, G. M. M. d., Brant, L. C. C., Polanczyk, C. A., Biolo, A., Nascimento, B. R., Malta, D. C., Souza, M. d. F. M. d., Soares, G. P., Xavier Junior, G. F., Machline-Carrion, M. J., et al. (2020). Cardiovascular statistics–brazil 2020. Arquivos Brasileiros de Cardiologia, 115:308–439.

Rabonato, R. T. and Berton, L. (2024). A systematic review of fairness in machine learning. AI and Ethics, pages 1–12.

Raza, S. (2023). Connecting fairness in machine learning with public health equity. In 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI), pages 704–708. IEEE.

Sattigeri, P., Hoffman, S. C., Chenthamarakshan, V., and Varshney, K. R. (2019). Fairness gan: Generating datasets with fairness properties using a generative adversarial network. IBM Journal of Research and Development, 63(4/5):3–1.

Taber, P., Armin, J. S., Orozco, G., Del Fiol, G., Erdrich, J., Kawamoto, K., and Israni, S. T. (2023). Artificial intelligence and cancer control: toward prioritizing justice, equity, diversity, and inclusion (jedi) in emerging decision support technologies. Current Oncology Reports, 25(5):387–424.

Xu, D., Yuan, S., Zhang, L., and Wu, X. (2019). Fairgan+: Achieving fair data generation and classification through generative adversarial nets. In 2019 IEEE international conference on big data (Big Data), pages 1401–1406. IEEE.

Yeom, S. and Tschantz, M. C. (2021). Avoiding disparity amplification under different worldviews. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 273–283.

Zhang, B. H., Lemoine, B., and Mitchell, M. (2018). Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340.

Analyzing the Trade-off Between Fairness and Model Performance in Supervised Learning: A Case Study in the MIMIC dataset

Abstract

References

Most read articles by the same author(s)