Integration of Epidemiologic, Socioeconomic, and Sociodemographic Indicators to Predict Early COVID-19 In-Hospital Outcomes

  • Hetielle Matos Universidade do Vale do Rio dos Sinos
  • Artur Brenner Schmitt Universidade do Vale do Rio dos Sinos
  • Felipe André Zeiser Universidade do Vale do Rio dos Sinos
  • Cristiano André da Costa Universidade do Vale do Rio dos Sinos
  • Gabriel de Oliveira Ramos Universidade do Vale do Rio dos Sinos


The COVID-19 pandemic is an unprecedented challenge for healthcare systems around the world. In Brazil, the COVID-19 pandemic affected the population differently. Sociodemographic and socioeconomic characteristics were important indicators of early access and quality of the health system. In this way, we combine epidemiological, socioeconomic, and sociodemographic data to predict in-hospital outcomes of COVID-19. The proposed approach utilizes models such as Random Forest, XGBoost, TabNet, and CatBoost, and employs Bayesian optimization for automatic hyperparameter selection. The results demonstrate that all models exhibit a relatively higher ability to correctly identify hospital discharge outcomes than mortality cases. However, XGBoost showed the best result, with a Precision of 0.72, Recall of 0.74, F1-score of 0.64, Accuracy of 0.74, and AUC of 0.83. The quantitative and qualitative results demonstrate that our method can effectively suggest high-quality in-hospital outcomes and demonstrate the possibility of using our methodology as a tool to assist healthcare professionals.

Palavras-chave: COVID-19, In-Hospital Outcomes, Machine Learning


Andersen, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C., and Garry, R. F. (2020). The proximal origin of sars-cov-2. Nature medicine, 26(4):450–452.

Arik, S. Ö. and Pfister, T (2021). Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6679–6687.

Baqui, P., Marra, V., Alaa, A. M., Bica, I., Ercole, A., and van der Schaar, M. (2021). Comparing covid-19 risk factors in brazil using machine learning: the importance of socioeconomic, demographic and structural factors. Scientific reports, 11(1):15591.

Barough, S. S., Safavi-Naini, S. A. A., Siavoshi, F., Tamimi, A., Ilkhani, S., Akbari, S., Ezzati, S., Hatamabadi, H., and Pourhoseingholi, M. A. (2023). Generalizable machine learning approach for covid-19 mortality risk prediction using on-admission clinical and laboratory features. Scientific Reports, 13(1):2399.

Betthäuser, B. A., Bach-Mortensen, A. M., and Engzell, P. (2023). A systematic review and meta-analysis of the evidence on learning during the covid-19 pandemic. Nature Human Behaviour, 7(3):375–385.

Cribari-Neto, F. (2023). A beta regression analysis of covid-19 mortality in brazil. Infectious Disease Modelling, 8(2):309–317.

De Souza, F. S. H., Hojo-Souza, N. S., Dos Santos, E. B., Da Silva, C. M., and Guidoni, D. L. (2021). Predicting the disease outcome in covid-19 positive patients through machine learning: a retrospective cohort study with brazilian data. Frontiers in Artificial Intelligence, 4:579931.

Docherty, A. B., Harrison, E. M., Green, C. A., Hardwick, H. E., Pius, R., Norman, L., Holden, K. A., Read, J. M., Dondelinger, F., Carson, G., et al. (2020). Features of 20 133 uk patients in hospital with covid-19 using the isaric who clinical characterisation protocol: prospective observational cohort study. bmj, 369.

Dorogush, A. V., Ershov, V., and Gulin, A. (2018). Catboost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.

Figuerêdo, J. S. L., Araújo-Calumby, R. F., and Calumby, R. T. (2021). Machine learning for prognosis of patients with covid-19: An early days analysis. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pages 59–70. SBC.

Green, H., Fernandez, R., and MacPhail, C. (2021). The social determinants of health and health outcomes among adults during the covid-19 pandemic: A systematic review. Public Health Nursing, 38(6):942–952.

Tsiotas, D. and Tselios, V. (2022). Understanding the uneven spread of covid-19 in the context of the global interconnected economy. Scientific reports, 12(1):666.

Zeiser, F. A., Donida, B., da Costa, C. A., de Oliveira Ramos, G., Scherer, J. N., Barcellos, N. T., Alegretti, A. P., Ikeda, M. L. R., Müller, A. P. W. C., Bohn, H. C., et al. (2022). First and second covid-19 waves in brazil: A cross-sectional study of patients’ characteristics related to hospitalization and in-hospital mortality. The Lancet Regional Health-Americas, 6:100107.

Zhang, B., Kang, W., Xiong, S., Huang, X., Chen, P., Huang, J., Hou, Y., Ma, L., and Xiang, T. (2023). Changes in the epidemiological characteristics of prehospital emergency services before and during the covid-19 pandemic, chengdu, 2016–2021. Scientific Reports, 13(1):7796.
MATOS, Hetielle; SCHMITT, Artur Brenner; ZEISER, Felipe André; COSTA, Cristiano André da; RAMOS, Gabriel de Oliveira. Integration of Epidemiologic, Socioeconomic, and Sociodemographic Indicators to Predict Early COVID-19 In-Hospital Outcomes. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 1037-1047. ISSN 2763-9061. DOI: