Classificação do Status Glicêmico a partir de Hemogramas: Avaliação de Estratégias de Aprendizado de Máquina em Dados Laboratoriais Reais do Brasil
Resumo
O diagnóstico precoce do diabetes é essencial, porém a dosagem de hemoglobina glicada (A1c) nem sempre está acessível. Investigamos se o status glicêmico (normal, pré-diabético e diabético) pode ser classificado exclusivamente a partir de dados do hemograma completo utilizando cerca de 170 mil registros laboratoriais reais. Foram avaliadas estratégias de classificação binária, multiclasse, decomposição binária e ensemble. Redes neurais apresentaram o melhor desempenho (F2 = 0,793 na tarefa binária e 0,551 na multiclasse), e não foram observados ganhos no uso de ensemble. A análise de erros revelou maior taxa de classificações incorretas próxima aos limiares diagnósticos de A1c, indicando maior dificuldade em estados glicêmicos de transição. Idade, leucócitos e RDW foram os preditores mais relevantes. Os resultados indicam que dados do hemograma contêm sinais associados ao status glicêmico, embora com limitações para triagem.Referências
Al-hussein, F., Tafakori, L., Abdollahian, M., Al-Shali, K., and Al-Hejin, A. (2025). A hybrid approach to enhance HbA1c prediction accuracy while minimizing the number of associated predictors: A case-control study in Saudi Arabia. PLoS One, 20(6):e0326315.
Alhassan, Z., Watson, M., Budgen, D., Alshammari, R., Alessa, A., and Moubayed, N. A. (2021). Improving current glycated hemoglobin prediction in adults: Use of machine learning algorithms with electronic health records. JMIR Medical Informatics, 9(5):e25237.
Bambo, G. M., Asmelash, D., Alemayehu, E., Gedefie, A., Duguma, T., and Kebede, S. S. (2024). Changes in selected hematological parameters in patients with type 1 and type 2 diabetes: A systematic review and meta-analysis. Frontiers in Medicine, 11:1294290.
Cardozo, G. et al. (2022). Use of machine learning and routine laboratory tests for diabetes mellitus screening. BioMed Research International, 2022:8114049.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.
Cheng, Y.-L., Wu, Y.-R., Lin, K.-D., Lin, C.-H. R., and Lin, I.-M. (2023). Using machine learning for the risk factors classification of glycemic control in type 2 diabetes mellitus. Healthcare, 11(8):1141.
Galaviz, K. I., Weber, M. B., Suvada, K., Gujral, U. P., Wei, J., Merchant, R., Dharanendra, S., Haw, J. S., Narayan, K. M. V., and Ali, M. K. (2022). Interventions for reversing prediabetes: A systematic review and meta-analysis. American Journal of Preventive Medicine, 62(4):614–625.
Le, V. O. H. et al. (2022). Formation and evaluation of complete blood count proficiency testing program. Hematology Reports, 14(2):73–84.
Lemaı̂tre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.
Lu, Y., Wang, W., Liu, J., Xie, M., Liu, Q., and Li, S. (2023). Vascular complications of diabetes: A narrative review. Medicine, 102(40):e35285.
Mansoori, A. et al. (2023). Prediction of type 2 diabetes mellitus using hematological factors based on machine learning approaches: A cohort study analysis. Scientific Reports, 13:663.
NCD Risk Factor Collaboration (NCD-RisC) (2023). Global variation in diabetes diagnosis and prevalence based on fasting glucose and hemoglobin A1c. Nature Medicine, 29(11):2885–2901.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Tahir, A., Asghar, K., Shafiq, W., et al. (2024). Fingerprinting hyperglycemia using predictive modelling approach based on low-cost routine CBC and CRP diagnostics. Scientific Reports, 14(1):1090.
The Lancet (2023). Diabetes: a defining disease of the 21st century. The Lancet, 401(10394):2087.
World Health Organization (2021). Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus.
Alhassan, Z., Watson, M., Budgen, D., Alshammari, R., Alessa, A., and Moubayed, N. A. (2021). Improving current glycated hemoglobin prediction in adults: Use of machine learning algorithms with electronic health records. JMIR Medical Informatics, 9(5):e25237.
Bambo, G. M., Asmelash, D., Alemayehu, E., Gedefie, A., Duguma, T., and Kebede, S. S. (2024). Changes in selected hematological parameters in patients with type 1 and type 2 diabetes: A systematic review and meta-analysis. Frontiers in Medicine, 11:1294290.
Cardozo, G. et al. (2022). Use of machine learning and routine laboratory tests for diabetes mellitus screening. BioMed Research International, 2022:8114049.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.
Cheng, Y.-L., Wu, Y.-R., Lin, K.-D., Lin, C.-H. R., and Lin, I.-M. (2023). Using machine learning for the risk factors classification of glycemic control in type 2 diabetes mellitus. Healthcare, 11(8):1141.
Galaviz, K. I., Weber, M. B., Suvada, K., Gujral, U. P., Wei, J., Merchant, R., Dharanendra, S., Haw, J. S., Narayan, K. M. V., and Ali, M. K. (2022). Interventions for reversing prediabetes: A systematic review and meta-analysis. American Journal of Preventive Medicine, 62(4):614–625.
Le, V. O. H. et al. (2022). Formation and evaluation of complete blood count proficiency testing program. Hematology Reports, 14(2):73–84.
Lemaı̂tre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.
Lu, Y., Wang, W., Liu, J., Xie, M., Liu, Q., and Li, S. (2023). Vascular complications of diabetes: A narrative review. Medicine, 102(40):e35285.
Mansoori, A. et al. (2023). Prediction of type 2 diabetes mellitus using hematological factors based on machine learning approaches: A cohort study analysis. Scientific Reports, 13:663.
NCD Risk Factor Collaboration (NCD-RisC) (2023). Global variation in diabetes diagnosis and prevalence based on fasting glucose and hemoglobin A1c. Nature Medicine, 29(11):2885–2901.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Tahir, A., Asghar, K., Shafiq, W., et al. (2024). Fingerprinting hyperglycemia using predictive modelling approach based on low-cost routine CBC and CRP diagnostics. Scientific Reports, 14(1):1090.
The Lancet (2023). Diabetes: a defining disease of the 21st century. The Lancet, 401(10394):2087.
World Health Organization (2021). Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus.
Publicado
01/06/2026
Como Citar
MARTINI, Gabriel Eduardo; RECAMONDE-MENDOZA, Mariana.
Classificação do Status Glicêmico a partir de Hemogramas: Avaliação de Estratégias de Aprendizado de Máquina em Dados Laboratoriais Reais do Brasil. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 26. , 2026, Ouro Preto/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 621-632.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2026.21406.
