Heterogeneous Ensemble Models for In-Hospital Mortality Prediction

  • Mattyws F. Grawe UFRGS
  • Viviane P. Moreira UFRGS


Electronic Health Records data are rich and contain different types of variables, including structured data (e.g., demographics), free text (e.g., medical notes), and time series data. In this work, we explore the use of these different types of data for the task of in-hospital mortality prediction, which seeks to predict the outcome of death for patients admitted to the hospital. We build base learning models for the different data types and combine them in a heterogeneous ensemble model. In these models, we apply state-of-the-art classification algorithms based on deep learning. Our experiments on a set of 20K ICU patients from the MIMIC-III dataset showed that the ensemble method brings improvements of 3 percentage points, achieving an AUROC of 0.853.


Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W. Kempa-Liehr. Time series feature extraction on basis of scalable hypothesis tests (tsfresh – a python package). Neurocomputing, 307:72–77, 2018. ISSN 0925-2312.

Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1), Jun 2019. ISSN 2052-4463.

Mohammad Hashir and Rapinder Sawhney. Towards unstructured mortality prediction with free-text clinical notes. Journal of Biomedical Informatics, 108:103489, 2020.

Kwok Ho, KY Lee, Teresa Williams, Judith Finn, M Knuiman, and S Webb. Comparison of acute physiology and chronic health evaluation (APACHE) II score with organ failure scores to predict hospital mortality. Anaesthesia, 62:466–73, 06 2007.

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997. ISSN 0899-7667.

Jacob Jentzer, Courtney Bennett, Brandon Wiley, Dennis Murphree, Mark Keegan, Ognjen Gajic, R. Wright, and Gregory Barsness. Predictive value of the sequential organ failure assessment score for mortality in a contemporary cardiac intensive care unit population. Journal of the American Heart Association, 7:e008169, 03 2018.

Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. MIMIC-III, a freely accessible critical care database. Scientific data, 3:160035, 2016.

Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998. ISSN 0018-9219.

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1):6765–6816, 2017.

Cissé-Luc Mbongo, Pablo Monedero, Francisco Guillen-Grima, Maria Yepes, Marc Vives, and Gemma Echarri. Performance of saps3, compared with APACHE II and SOFA, to predict hospital mortality in a general ICU in southern europe. European journal of anaesthesiology, 26:940–5, 08 2009.

Lilian Minne, Ameen Abu-Hanna, and Evert de Jonge. Evaluation of SOFA-based models for predicting mortality in the ICU: A systematic review. Critical care, 12(6):1–13, 2008.

Romain Pirracchio, Maya L Petersen, Marco Carone, Matthieu Resche Rigon, Sylvie Chevret, Prof Mark, and J Van Der Laan. Mortality prediction in the ICU: can we do better? Results from the Super ICU Learner Algorithm (SICULA) project, a population-based study. The Lancet. Respiratory medicine, 3 (1):42–52, 2015.

Mohammed Saeed, Mauricio Villarroel, Andrew Reisner, Gari Clifford, Li-wei Lehman, George Moody, Thomas Heldt, Tin Kyaw, Benjamin Moody, and Roger Mark. Multiparameter intelligent monitoring in intensive care II MIMIC-II: A public-access intensive care unit database. Critical care medicine, 39: 952–60, 05 2011.

Madhumita Sushil, Simon Šuster, Kim Luyckx, and Walter Daelemans. Patient representation learning and interpretable evaluation using clinical notes. Journal of Biomedical Informatics, 84:103–113, 2018. ISSN 1532-0464.

K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271–289, jan 1999. ISSN 10769757.

Jean-Louis Vincent, Rui Moreno, Jukka Takala, S Willatts, A Mendonça, H Bruining, C Reinhart, Peter Suter, and L Thijs. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Intensive care medicine, 22:707–10, 08 1996.

Gary E. Weissman, Rebecca A. Hubbard, Lyle H. Ungar, Michael O. Harhay, Casey S. Greene, Blanca E. Himes, and Scott D. Halpern. Inclusion of unstructured clinical text improves early prediction of death or prolonged ICU stay. Critical Care Medicine, 46(7):1125–1132, 2018. ISSN 15300293.

David Wolpert. Stacked generalization. Neural Networks, 5:241–259, 12 1992.
GRAWE, Mattyws F.; MOREIRA, Viviane P.. Heterogeneous Ensemble Models for In-Hospital Mortality Prediction. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 23. , 2023, São Paulo/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 71-82. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2023.229442.