Classification of Glycemic Status from Complete Blood Counts: Evaluation of Machine Learning Strategies on Real-World Brazilian Laboratory Data
Abstract
Early diagnosis of diabetes is essential, yet glycated hemoglobin (A1c) testing is not always accessible. We investigated whether glycemic status (normal, pre-diabetic, and diabetic) can be classified exclusively from complete blood count (CBC) data using about 170000 real laboratory records. Binary, multiclass, binary decomposition, and ensemble strategies were evaluated. Neural network models achieved the best performance (F2 = 0.793 in the binary task and 0.551 in the multiclass setting), with no gains observed from the ensemble approach. Error analysis revealed higher misclassification rates near diagnostic A1c thresholds, indicating greater difficulty in transitional glycemic states. Age, leukocytes, and RDW were the most relevant predictors. These results suggest that CBC data contain signals associated with glycemic status, although with limitations for screening applications.References
Al-hussein, F., Tafakori, L., Abdollahian, M., Al-Shali, K., and Al-Hejin, A. (2025). A hybrid approach to enhance HbA1c prediction accuracy while minimizing the number of associated predictors: A case-control study in Saudi Arabia. PLoS One, 20(6):e0326315.
Alhassan, Z., Watson, M., Budgen, D., Alshammari, R., Alessa, A., and Moubayed, N. A. (2021). Improving current glycated hemoglobin prediction in adults: Use of machine learning algorithms with electronic health records. JMIR Medical Informatics, 9(5):e25237.
Bambo, G. M., Asmelash, D., Alemayehu, E., Gedefie, A., Duguma, T., and Kebede, S. S. (2024). Changes in selected hematological parameters in patients with type 1 and type 2 diabetes: A systematic review and meta-analysis. Frontiers in Medicine, 11:1294290.
Cardozo, G. et al. (2022). Use of machine learning and routine laboratory tests for diabetes mellitus screening. BioMed Research International, 2022:8114049.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.
Cheng, Y.-L., Wu, Y.-R., Lin, K.-D., Lin, C.-H. R., and Lin, I.-M. (2023). Using machine learning for the risk factors classification of glycemic control in type 2 diabetes mellitus. Healthcare, 11(8):1141.
Galaviz, K. I., Weber, M. B., Suvada, K., Gujral, U. P., Wei, J., Merchant, R., Dharanendra, S., Haw, J. S., Narayan, K. M. V., and Ali, M. K. (2022). Interventions for reversing prediabetes: A systematic review and meta-analysis. American Journal of Preventive Medicine, 62(4):614–625.
Le, V. O. H. et al. (2022). Formation and evaluation of complete blood count proficiency testing program. Hematology Reports, 14(2):73–84.
Lemaı̂tre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.
Lu, Y., Wang, W., Liu, J., Xie, M., Liu, Q., and Li, S. (2023). Vascular complications of diabetes: A narrative review. Medicine, 102(40):e35285.
Mansoori, A. et al. (2023). Prediction of type 2 diabetes mellitus using hematological factors based on machine learning approaches: A cohort study analysis. Scientific Reports, 13:663.
NCD Risk Factor Collaboration (NCD-RisC) (2023). Global variation in diabetes diagnosis and prevalence based on fasting glucose and hemoglobin A1c. Nature Medicine, 29(11):2885–2901.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Tahir, A., Asghar, K., Shafiq, W., et al. (2024). Fingerprinting hyperglycemia using predictive modelling approach based on low-cost routine CBC and CRP diagnostics. Scientific Reports, 14(1):1090.
The Lancet (2023). Diabetes: a defining disease of the 21st century. The Lancet, 401(10394):2087.
World Health Organization (2021). Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus.
Alhassan, Z., Watson, M., Budgen, D., Alshammari, R., Alessa, A., and Moubayed, N. A. (2021). Improving current glycated hemoglobin prediction in adults: Use of machine learning algorithms with electronic health records. JMIR Medical Informatics, 9(5):e25237.
Bambo, G. M., Asmelash, D., Alemayehu, E., Gedefie, A., Duguma, T., and Kebede, S. S. (2024). Changes in selected hematological parameters in patients with type 1 and type 2 diabetes: A systematic review and meta-analysis. Frontiers in Medicine, 11:1294290.
Cardozo, G. et al. (2022). Use of machine learning and routine laboratory tests for diabetes mellitus screening. BioMed Research International, 2022:8114049.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.
Cheng, Y.-L., Wu, Y.-R., Lin, K.-D., Lin, C.-H. R., and Lin, I.-M. (2023). Using machine learning for the risk factors classification of glycemic control in type 2 diabetes mellitus. Healthcare, 11(8):1141.
Galaviz, K. I., Weber, M. B., Suvada, K., Gujral, U. P., Wei, J., Merchant, R., Dharanendra, S., Haw, J. S., Narayan, K. M. V., and Ali, M. K. (2022). Interventions for reversing prediabetes: A systematic review and meta-analysis. American Journal of Preventive Medicine, 62(4):614–625.
Le, V. O. H. et al. (2022). Formation and evaluation of complete blood count proficiency testing program. Hematology Reports, 14(2):73–84.
Lemaı̂tre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.
Lu, Y., Wang, W., Liu, J., Xie, M., Liu, Q., and Li, S. (2023). Vascular complications of diabetes: A narrative review. Medicine, 102(40):e35285.
Mansoori, A. et al. (2023). Prediction of type 2 diabetes mellitus using hematological factors based on machine learning approaches: A cohort study analysis. Scientific Reports, 13:663.
NCD Risk Factor Collaboration (NCD-RisC) (2023). Global variation in diabetes diagnosis and prevalence based on fasting glucose and hemoglobin A1c. Nature Medicine, 29(11):2885–2901.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Tahir, A., Asghar, K., Shafiq, W., et al. (2024). Fingerprinting hyperglycemia using predictive modelling approach based on low-cost routine CBC and CRP diagnostics. Scientific Reports, 14(1):1090.
The Lancet (2023). Diabetes: a defining disease of the 21st century. The Lancet, 401(10394):2087.
World Health Organization (2021). Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus.
Published
2026-06-01
How to Cite
MARTINI, Gabriel Eduardo; RECAMONDE-MENDOZA, Mariana.
Classification of Glycemic Status from Complete Blood Counts: Evaluation of Machine Learning Strategies on Real-World Brazilian Laboratory Data. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 26. , 2026, Ouro Preto/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 621-632.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2026.21406.
