When Errors Inform: Supporting Diabetes Diagnosis in High-Uncertainty Scenarios

  • Samuel Norberto Alves Federal University of Minas Gerais (UFMG)
  • Celso França Federal University of Minas Gerais (UFMG)
  • Regina T. I. Bernal Federal University of Minas Gerais (UFMG)
  • Crizian S. Gomes Federal University of Minas Gerais (UFMG)
  • Oluwatoyin Joy Omole Federal University of Minas Gerais (UFMG)
  • Deborah Malta Federal University of Minas Gerais (UFMG)
  • Marcos André Gonçalves Federal University of Minas Gerais (UFMG)
  • Jussara M. Almeida Federal University of Minas Gerais (UFMG)

Abstract


We investigated the effectiveness of supervised machine learning methods in identifying individuals who may be undiagnosed or at high risk of developing Diabetes Mellitus (DM) in the context of private health insurance providers. The scenario is challenging: only indirect administrative data are available (such as the type and frequency of exams), without access to clinical results, along with low class separability and label uncertainty. We evaluated three classifiers (XGBoost, Random Forest, and Logistic Regression), achieving robust performance (Macro-F1 of 90.1%). Error analysis suggests that false positives may indicate undiagnosed cases, while false negatives may reflect inadequate clinical management.

Keywords: Diabetes prediction, Label uncertainty, low separability, indirect attributes

References

Alnowaiser, K. (2024). Improving healthcare prediction of diabetic patients using knn imputed features and tri-ensemble model. IEEE Access, 12:16783–16793.

ANS (2021). Promoção da saúde e prevenção de doenças - PROMOPREV - [link]. Atualizado em 06/06/2025.

Banday, M. Z., Sameer, A. S., and Nissar, S. (2020). Pathophysiology of diabetes: An overview. Avicenna journal of medicine, 10(04):174–188.

Cunha, W. et al. (2023). An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification. In SIGIR, page 665–674.

Cunha, W., Moreo Fernández, A., Esuli, A., Sebastiani, F., Rocha, L., and Gonçalves, M. A. (2025). A noise-oriented and redundancy-aware instance selection framework. ACM Trans. Inf. Syst., 43(2).

da Cunha Paula, D. J. (2014). Análise de custo e efetividade do tratamento de diabéticos adultos atendidos no centro hiperdia de juiz de fora, minas gerais. Dissertação de mestrado, Universidade Federal de Juiz de Fora, Juiz de Fora, MG, Brasil. Aprovado em 17 de fevereiro de 2014.

Dinh, A., Miertschin, S., Young, A., and Mohanty, S. D. (2019). A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Medical Informatics and Decision Making, 19(1):211.

Ferreira, T., França, C., A. Gonçalves, M., Pagano, A., et al. (2021). Evaluating recognizing question entailment methods for a Portuguese community question-answering system about diabetes mellitus. In Proc. Int’l Conf. on Recent Advances in Natural Language Processing.

França, C., Lima, R. C., Andrade, C., Cunha, W., de Melo, P. O. V., Ribeiro-Neto, B., Rocha, L., Santos, R. L., Pagano, A. S., and Gonçalves, M. A. (2024). On representation learning-based methods for effective, efficient, and scalable code retrieval. Neurocomputing, 600:128172.

Glechner, A., Keuchel, L., Affengruber, L., Titscher, V., Sommer, I., Matyas, N., Wagner, G., Kien, C., Klerings, I., and Gartlehner, G. (2018). Effects of lifestyle changes on adults with prediabetes: A systematic review and meta-analysis. Primary care diabetes, 12(5):393–408.

Kiran, M., Xie, Y., Anjum, N., Ball, G., Pierscionek, B., and Russell, D. (2025). Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis. Frontiers in Digital Health, 7:1557467.

Sledzik, R. and Zabihimayvan, M. (2022). Focal loss improves performance of high-sensitivity c-reactive protein imbalanced classification. In 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), pages 114–118.

Tuppad, A. and Devi Patil, S. (2024). An efficient classification framework for type 2 diabetes incorporating feature interactions. Expert Systems with Applications, 239:122138.
Published
2025-09-29
ALVES, Samuel Norberto; FRANÇA, Celso; BERNAL, Regina T. I.; GOMES, Crizian S.; OMOLE, Oluwatoyin Joy; MALTA, Deborah; GONÇALVES, Marcos André; ALMEIDA, Jussara M.. When Errors Inform: Supporting Diabetes Diagnosis in High-Uncertainty Scenarios. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 40. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 788-794. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2025.247707.