Machine Learning Models for Predicting Mortality in Hemodialysis Patients
Resumo
In Brazil, over 133,464 individuals with Chronic Kidney Disease (CKD) undergo hemodialysis, facing significant mortality risks. The mandatory biomarkers for monitoring these patients are specified by the 2014 Clinical Guideline of the Ministry of Health. Annually, more than thirty biomarkers are periodically evaluated; however, no critical evaluation of the predictive value of these biomarkers using machine learning (ML) has been conducted in Brazil to date. This paper aims to develop ML models to predict mortality outcomes in hemodialysis patients based on routine biomarkers. The goal is to investigate technologies that can assess the predictive effectiveness of clinical tests, ultimately improving patient quality of life and contributing to cost management within the Brazilian Unified Health System (SUS). This study utilizes data from hemodialysis patients in a retrospective cohort study conducted between 2012 and 2016 across 23 dialysis units in five Brazilian states. The features used in model development include biomarkers, patient profile variables, and clinical outcomes. Various ML approaches and algorithms are tested, including Decision Tree, Random Forest, Logistic Regression, XGBoost and TabPFN to identify and compare the most accurate predictive model. Among the tested models, TabPFN exhibited the best overall predictive performance, notably benefiting from balanced training data. Furthermore, the application of SHAP (SHapley Additive ExPlanations) provided clear and interpretable insights into the most influential biomarkers, which contributed to understanding the clinical plausibility of the results.
Palavras-chave:
Biomarkers, Prognostic Models, Artificial Intelligence, Renal Replacement Therapy, Chronic Kidney Disease
Referências
W. G. Couser, G. Remuzzi, S. Mendis, and M. Tonelli, “The contribution of chronic kidney disease to the global burden of major noncommunicable diseases,” Kidney international, vol. 80, no. 12, pp. 1258–1270, 2011.
P. D. M. d. M. Neves, R. d. C. C. Sesso, F. S. Thomé, J. R. Lugon, and M. M. Nasicmento, “Brazilian dialysis census: analysis of data from the 2009-2018 decade,” Brazilian Journal of Nephrology, vol. 42, pp. 191–200, 2020.
“Valor apresentado de produção ambulatorial do sus para procedimento hemodiálise no período 2009-2018 no brasil.” 2021.
H. Habehh and S. Gohel, “Machine learning in healthcare,” Curr Genomics, vol. 22, no. 4, pp. 291–300, Dec. 2021.
V. Mahalingasivam, G. Su, M. Iwagami, M. R. Davids, J. B. Wetmore, and D. Nitsch, “COVID-19 and kidney disease: insights from epidemiology to inform clinical practice,” Nat Rev Nephrol, vol. 18, no. 8, pp. 485–498, Apr. 2022.
W. Gunarathne, K. Perera, and K. Kahandawaarachchi, “Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (ckd),” in 2017 IEEE 17th international conference on bioinformatics and bioengineering (BIBE). IEEE, 2017, pp. 291–296.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth International Group, 1984.
D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression. John Wiley & Sons, 2013.
L. Breiman and A. Cutler, Random Forests. CRC press, 2001.
T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, S. B. Hoo, R. T. Schirrmeister, and F. Hutter, “Accurate predictions on small data with a tabular foundation model,” Nature, vol. 637, pp. 319–326, 2025.
P. D. M. d. M. Neves, R. d. C. C. Sesso, F. S. Thomé, J. R. Lugon, and M. M. Nasicmento, “Brazilian dialysis census: analysis of data from the 2009-2018 decade,” Brazilian Journal of Nephrology, vol. 42, pp. 191–200, 2020.
“Valor apresentado de produção ambulatorial do sus para procedimento hemodiálise no período 2009-2018 no brasil.” 2021.
H. Habehh and S. Gohel, “Machine learning in healthcare,” Curr Genomics, vol. 22, no. 4, pp. 291–300, Dec. 2021.
V. Mahalingasivam, G. Su, M. Iwagami, M. R. Davids, J. B. Wetmore, and D. Nitsch, “COVID-19 and kidney disease: insights from epidemiology to inform clinical practice,” Nat Rev Nephrol, vol. 18, no. 8, pp. 485–498, Apr. 2022.
W. Gunarathne, K. Perera, and K. Kahandawaarachchi, “Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (ckd),” in 2017 IEEE 17th international conference on bioinformatics and bioengineering (BIBE). IEEE, 2017, pp. 291–296.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth International Group, 1984.
D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression. John Wiley & Sons, 2013.
L. Breiman and A. Cutler, Random Forests. CRC press, 2001.
T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, S. B. Hoo, R. T. Schirrmeister, and F. Hutter, “Accurate predictions on small data with a tabular foundation model,” Nature, vol. 637, pp. 319–326, 2025.
Publicado
30/09/2025
Como Citar
LANTMANN, L. L.; GAUER, F. G.; REINHEIMER, I. C.; SOUZA, D. C. de; BERNARDES, M. M.; POLI-DE-FIGUEIREDO, C. E.; MUSSE, S. R..
Machine Learning Models for Predicting Mortality in Hemodialysis Patients. In: WORKSHOP DE TRABALHOS EM ANDAMENTO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 38. , 2025, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 200-205.
