Bias Propagation in Health AI: Measuring Pre-Training Bias and Its Effect on Machine Learning Model Outcomes

Diego Dimer Rodrigues; Mariana Recamonde-Mendoza

doi:10.5753/sbcas.2025.7143

Diego Dimer Rodrigues UFRGS
Mariana Recamonde-Mendoza UFRGS / HCPA

DOI: https://doi.org/10.5753/sbcas.2025.7143

Abstract

Machine learning (ML) has become an essential tool in healthcare, supporting diagnosis, prognosis, and treatment decisions. However, biases present in pre-training data can compromise both model performance and fairness, disproportionately affecting underrepresented groups. This study systematically examines the impact of four pre-training bias metrics on the accuracy of three ML models across four health-related datasets. Our findings show that more data does not necessarily translate to better performance, particularly when data imbalance and bias are present. Moreover, pre-training bias metrics are associated with accuracy disparities, underscoring the importance of proactive bias assessment to develop more equitable ML models in healthcare.

References

Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J., Saha, D., Sattigeri, P., Singh, M., Varshney, K. R., and Zhang, Y. (2018). Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv.

Caton, S. and Haas, C. (2024). Fairness in machine learning: A survey. ACM Computing Surveys, 56(7):1–38.

Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., and Ghassemi, M. (2021). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science Annu. Rev. Biomed. Data Sci, 2021:123–144.

Juhn, Y. J., Ryu, E., Wi, C.-I., King, K. S., Malik, M., Romero-Brufau, S., Weng, C., Sohn, S., Sharp, R. R., and Halamka, J. D. (2022). Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the houses index. Journal of the American Medical Informatics Association, 29(7):1142–1151.

Júnior, R. L. I., Silveira, L., de Faria, V. C. N., and Lorena, A. C. (2022). Justiça nas previsões de modelos de aprendizado de máquina: um estudo de caso com dados de reincidência criminal. In Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional, pages 636–647. SBC.

Mandhala, V. N., Bhattacharyya, D., Midhunchakkaravarthy, D., and Kim, H. J. (2022). Detecting and mitigating bias in data using machine learning with pre-training metrics. Ingenierie des Systemes d’Information, 27:119–125.

Maslej, M., Sikstrom, L., Reslan, D., and Wang, Y. (2022). Intersectional-bias-assessment.

Newman, D., Hettich, S., Blake, C., and Merz, C. (1998). UCI repository of machine learning databases.

Noseworthy, P. A., Attia, Z. I., Brewer, L. P. C., Hayes, S. N., Yao, X., Kapa, S., Friedman, P. A., and Lopez-Jimenez, F. (2020). Assessing and mitigating bias in medical artificial intelligence: The effects of race and ethnicity on a deep learning model for ecg analysis. Circulation: Arrhythmia and Electrophysiology.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453.

Park, Y., Hu, J., Singh, M., Sylla, I., Dankwa-Mullan, I., Koski, E., and Das, A. K. (2021). Comparison of methods to reduce bias from clinical prediction models of postpartum depression. JAMA Network Open, 4.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Synthesized.io (2023). Fairlens: A toolkit for fair and interpretable machine learning. [link]. Accessed: 2024-10-12.

Tasci, E., Zhuge, Y., Kaur, H., Camphausen, K., and Krauze, A. V. (2022). Hierarchical voting-based feature selection and ensemble learning model scheme for glioma grading with clinical and molecular characteristics. International Journal of Molecular Sciences, 23(22).

Teboul, A. (2023). Diabetes health indicators dataset. Accessed: 2024-12-30.

Zehlike, M., Castillo, C., Bonchi, F., Hajian, S., and Megahed, M. (2017). Fairness measures: Datasets and software for detecting algorithmic discrimination. [link].

Zhang, H., Lu, A. X., Abdalla, M., McDermott, M., and Ghassemi, M. (2020). Hurtful words. In ACM CHIL 2020 - Proceedings of the 2020 ACM Conference on Health, Inference, and Learning, pages 110–120. Association for Computing Machinery, Inc.

Bias Propagation in Health AI: Measuring Pre-Training Bias and Its Effect on Machine Learning Model Outcomes

Abstract

References

Most read articles by the same author(s)