Equitable Diabetes Diagnosis: Tackling Ethnic and Gender Disparities
Abstract
Machine Learning (ML) has advanced disease diagnosis in healthcare, but raises fairness concerns, as model biases can perpetuate social inequalities. This study aims to evaluate and mitigate bias in diabetes diagnosis prediction models. We conducted experiments considering ethnicity and gender as protected attributes, evaluating bias using the fairness metrics Statistical Parity Difference, Equal Opportunity Difference, and Average Odds Difference. We applied the bias mitigation techniques Reweighing and Prejudice Remover, which showed improvements in fairness metrics, with a reduction in disparities between groups, while maintaining model accuracy. These findings reinforce the need to integrate fairness considerations into ML models for healthcare applications.
References
Bhatti, A., Sandrock, T., and Nienkemper-Swanepoel, J. (2025). The influence of missing data mechanisms and simple missing data handling techniques on fairness. arXiv preprint arXiv:2503.07313.
Blow, C. H., Qian, L., Gibson, C., Obiomon, P., and Dong, X. (2024). Comprehensive validation on reweighting samples for bias mitigation via aif360. Applied Sciences, 14(9):3826.
Caton, S. and Haas, C. (2024). Fairness in machine learning: A survey. ACM Comput. Surv., 56(7).
Chang, V., Ganatra, M. A., Hall, K., Golightly, L., and Xu, Q. A. (2022). An assessment of machine learning models and algorithms for early prediction and diagnosis of diabetes using health indicators. Healthcare Analytics, 2:100118.
Cronjé, H., Katsiferis, A., Elsenburg, L., Andersen, T., Rod, N., et al. (2023). Assessing racial bias in type 2 diabetes risk prediction algorithms. PLOS Global Public Health, 3(5):e0001556.
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. S. (2012). Fairness through awareness. In Goldwasser, S., editor, Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on - ITCS '12, pages 214–226. ACM.
GBD 2021 Diabetes Collaborators (2023). Global, regional, and national burden of diabetes from 1990 to 2021: a systematic analysis for the global burden of disease study 2021. The Lancet, 402(10397):203– 234.
Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3323–3331, Red Hook, NY, USA. Curran Associates Inc.
Huang, J., Galal, G., Etemadi, M., and Vaidyanathan, M. (2022). Evaluation and mitigation of racial bias in clinical machine learning models: Scoping review. JMIR Med Inform, 10(5):e36388.
Huang, Y., Guo, J., Chen, W.-H., Lin, H.-Y., Tang, H., Wang, F., Xu, H., and Bian, J. (2024). A scoping review of fair machine learning techniques when using real-world data. medRxiv.
Kamishima, T., Akaho, S., Asoh, H., and Sakuma, J. (2012). Fairness-aware classifier with prejudice remover regularizer. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), pages 35–50. Springer.
Khanam, J. J. and Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4):432–439.
Klement, W. and El Emam, K. (2023). Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: Development and validation. J Med Internet Res, 25:e48763.
Kush Varshney (2018). Introducing ai fairness 360. [link]. Accessed: 2025-04-27.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Comput. Surv., 54(6).
Pias, T. S., Su, Y., Tang, X., Wang, H., Faghani, S., and Yao, D. (2025). Enhancing fairness and accuracy in diagnosing type 2 diabetes in young adult population. IEEE Journal of Biomedical and Health Informatics. Online ahead of print.
Raza, S. (2022). A machine learning model for predicting, diagnosing, and mitigating health disparities in hospital readmission. Healthcare Analytics, 2:100100.
Raza, S. (2023). Connecting fairness in machine learning with public health equity. In Proceedings of the 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI), pages 704–708. IEEE.
Ruback, L., Carvalho, D., and Avila, S. (2022). Mitigating bias in machine learning: A socio-technical analysis. iSys - Brazilian Journal of Information Systems, 15(1):23:1–23:31.
Talebi Moghaddam, M., Jahani, Y., Arefzadeh, Z., Dehghan, A., Khaleghi, M., Sharafi, M., and Nikfar, G. (2024). Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm. BMC Medical Research Methodology, 24(1):220.
Varshney, K. R. (2022). Trustworthy Machine Learning. Independently Published, Chappaqua, NY, USA.
Verma, S. and Rubin, J. (2018). Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness, FairWare ’18, page 1–7, New York, NY, USA. Association for Computing Machinery.
Wang, S. C. Y., Nickel, G., Venkatesh, K. P., Raza, M. M., and Kvedar, J. C. (2024). Ai-based diabetes care: risk prediction models and implementation concerns. NPJ Digital Medicine, 7(1):36.
Xie, Z., Nikolayeva, O., Luo, J., and Li, D. (2019). Building risk prediction models for type 2 diabetes using machine learning techniques. Preventing Chronic Disease, 16:E130.
