Predicting COVID-19 hospitalizations with attribute selection based on genetic and classification algorithms


  • Miriam Pizzatto Colpo Federal University of Pelotas (UFPel) / Federal Institute of Education, Science and Technology Farroupilha (IFFar)
  • Bruno Cascaes Alves Federal University of Pelotas (UFPel)
  • Kevin Soares Pereira Federal University of Pelotas (UFPel)
  • Anna Flávia Zimmermann Brandão Federal University of Pelotas (UFPel)
  • Marilton Sanchotene de Aguiar Federal University of Pelotas (UFPel)
  • Tiago Thompsen Primo Federal University of Pelotas (UFPel)



Feature selection, COVID-19, Genetic algorithm, Machine learning, Hospitalization prediction


The COVID-19 pandemic has been pressuring the whole society and overloading hospital systems. Machine learning models designed to predict hospitalizations, for example, can contribute to better targeting hospital resources. However, as the excess of information, often irrelevant or redundant, can impair predictive models’ performance, we propose a hybrid approach to attribute selection in this work. This method aims to find an optimal attribute subset through a genetic algorithm, which considers the results of a classification model in its evaluation function to improve the hospitalization need prediction of COVID-19 patients. We evaluated this approach in two official databases from the State Health Secretariat of Rio Grande do Sul, covering COVID-19 cases registered up to October 2020 and June 2021, respectively. As a result, we provided an increase of 18% in the classification precision for patients with hospitalization necessities in the first database, while in the second one, considering a temporal evaluation with sliding window, this gain was on average 6%. In a real-time application, this would also mean greater precision in targeting resources and, consequently and mainly, improved service to the infected population.


Download data is not yet available.


Alpaydin, E. (2010). Introduction to machine learning. MIT Press, Cambridge, 2nd edition.

Arvind, V., Kim, J. S., Cho, B. H., Geng, E., and Cho, S. K. (2021). Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. Journal of Critical Care, 62:25–30. doi:

Burdick, H., Lam, C., Mataraso, S., Siefkas, A., Braden, G., Dellinger, R. P., Mc-Coy, A., Vincent, J.-L., Green-Saxena, A., Barnes, G., Hoffman, J., Calvert, J., Pellegrini, E., and Das, R. (2020). Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial. Computers in Biology and Medicine, 124:103949. doi:

Colpo, M. P., Alves, B. C., Pereira, K. S., Brandão, A. F. Z., de Aguiar, M. S., and Primo, T. T. (2021). Attribute selection based on genetic and classification algorithms in the prediction of hospitalization need of COVID-19 patients. In XVII Brazilian Symposium on Information Systems, SBSI 2021, New York, NY, USA. Association for Computing Machinery. doi:

Cueto-López, N., García-Ordás, M. T., Dávila-Batista, V., Moreno, V., Aragonés, N., and Alaiz-Rodríguez, R. (2019). A comparative study on feature selection for a risk prediction model for colorectal cancer. Computer Methods and Programs in Biomedicine, 177:219–229. doi:

Faria, N. R., Mellan, T. A., Whittaker, C., Claro, I. M., da S. Candido, D., Mishra, S., Crispim, M. A. E., Sales, F. C. S., Hawryluk, I., McCrone, J. T., Hulswit, R. J. G., Franco, L. A. M., Ramundo, M. S., de Jesus, J. G., Andrade, P. S., Coletti, T. M., Ferreira, G. M., Silva, C. A. M., Manuli, E. R., Pereira, R. H. M., Peixoto, P. S., Kraemer, M. U. G., Gaburo, N., da C. Camilo, C., Hoeltgebaum, H., Souza, W. M., Rocha, E. C., de Souza, L. M., de Pinho, M. C., Araujo, L. J. T., Malta, F. S. V., de Lima, A. B., do P. Silva, J., Zauli, D. A. G., de S. Ferreira, A. C., Schnekenberg, R. P., Laydon, D. J.,Walker, P. G. T., Schlüter, H. M., dos Santos, A. L. P., Vidal, M. S., Caro, V. S. D., Filho, R. M. F., dos Santos, H. M., Aguiar, R. S., Proenc¸a-Modena, J. L., Nelson, B., Hay, J. A., Monod, M., Miscouridou, X., Coupland, H., Sonabend, R., Vollmer, M., Gandy, A., Prete, C. A., Nascimento, V. H., Suchard, M. A., Bowden, T. A., Pond, S. L. K., Wu, C.-H., Ratmann, O., Ferguson, N. M., Dye, C., Loman, N. J., Lemey, P., Rambaut, A., Fraiji, N. A., do P. S. S. Carvalho, M., Pybus, O. G., Flaxman, S., Bhatt, S., and Sabino, E. C. (2021). Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science, 372(6544):815–821. doi:

Funchal, J. P. d. S. and Adanatti, D. F. (2016). Um estudo sobre a classificação de risco na Área da saúde utilizando Árvores de decisão. iSys – Revista Brasileira de Sistemas de Informação, 9(3):9–111. doi:

Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts and techniques. Morgan Kaufmann, Waltham, 3rd edition.

Heckler, W. F., Varella, J. d. C., Costa, C. C. d., and Barbosa, J. L. V. (2020). A model to patient abandonment prediction in the pulmonary rehabilitation. In XVI Brazilian Symposium on Information Systems, SBSI’20, New York, NY, USA. Association for Computing Machinery. doi:

Linden, R. (2008). Algoritmos Genéticos. Brasport, Rio de Janeiro, 2nd edition.

Lynch, C. M., Abdollahi, B., Fuqua, J. D., de Carlo, A. R., Bartholomai, J. A., Balgemann, R. N., van Berkel, V. H., and Frieboes, H. B. (2017). Prediction of lung cancer patient survival via supervised machine learning classification techniques. International Journal of Medical Informatics, 108:1–8. doi:

Maleki, N., Zeinali, Y., and Niaki, S. T. A. (2021). A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications, 164:113981. doi:

Monteiro, F., Meloni, F., Baranauskas, J. A., and Macedo, A. A. (2020). Prediction of mortality in intensive care units: a multivariate feature selection. Journal of Biomedical Informatics, 107:103456. doi:

PAHO (2020). Pan American Health Organization. Ficha Informativa COVID-19: A COVID-19 e o papel dos sistemas de informação e das tecnologias na atenção primária. [link], May, 23.

Pawlovsky, A. P. and Matsuhashi, H. (2017). The use of a novel genetic algorithm in component selection for a kNN method for breast cancer prognosis. In 2017 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges (GMEPE/PAHCE), pages 1–5, Tuxtla Gutierrez, Mexico. IEEE. doi:

Pradeep, K. and Naveen, N. (2018). Lung cancer survivability prediction based on performance using classification techniques of support vector machines, c4.5 and naive bayes algorithms for healthcare analytics. Procedia Computer Science, 132:412–420. doi:

Raschka, S. and Mirjalili, V. (2017). Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. Packt Publishing, Birmingham, UK, 2nd edition.

Scikit-learn (2020). Feature selection. [link].

SES/RS (2020). Secretaria Estadual da Saúde do Rio Grande do Sul. Painel Coronavírus RS. [link].

The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team (2020). The Epidemiological Characteristics of an Outbreak of 2019 Novel Coronavirus Diseases (COVID-19) — China, 2020. China CDC Weekly, 2:113. doi:

World Health Organization (2020). COVID-19 Weekly Epidemiological Update - 27 December 2020. [link], December, 29.

Zhou, Y., Zhang, W., Kang, J., Zhang, X., and Wang, X. (2021). A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Information Sciences, 547:841–859. doi:




How to Cite

Pizzatto Colpo, M., Cascaes Alves, B., Soares Pereira, K., Zimmermann Brandão, A. F., Sanchotene de Aguiar, M., & Thompsen Primo, T. (2022). Predicting COVID-19 hospitalizations with attribute selection based on genetic and classification algorithms. ISys - Brazilian Journal of Information Systems, 15(1), 4:1–4:30.



Extended versions of selected articles