Attribute selection based on genetic and classification algorithms in the prediction of hospitalization need of COVID-19 patients

  • Miriam Pizzato Colpo UFPel
  • Bruno Cascaes Alves UFPel
  • Kevin Soares Pereira UFPel
  • Anna Flávia Zimmermann Brandão UFPel
  • Marilton Sanchonete de Aguiar UFPel
  • Tiago Thompsen Primo UFPel

Resumo


The COVID-19 pandemic has been pressuring the whole society and overloading hospital systems. Machine learning models designed to predict hospitalizations, for example, can contribute to better targeting hospital resources. However, as the excess of information, often irrelevant or redundant, can impair the performance of predictive models, we propose in this work a hybrid approach to attribute selection. This method aims to find an optimal attribute subset through a genetic algorithm, which considers the results of a classification model in its evaluation function to improve the hospitalization need prediction of COVID-19 patients. We evaluated this approach in a database of more than 200 thousand COVID-19 patients from the State Health Secretariat of Rio Grande do Sul. We provided an increase of 18% in the classification precision for patients with hospitalization necessities. In a real-time application, this would also mean greater precision in targeting resources, as well as, consequently and mainly, improved service to the infected population.

Palavras-chave: feature selection, COVID-19, genetic algorithm, machine learning, hospitalization prediction

Referências

Ethem Alpaydin. 2010. Introduction to machine learning (2nd ed.). MIT Press, Cambridge.

Varun Arvind, Jun S. Kim, Brian H. Cho, Eric Geng, and Samuel K. Cho. 2021. Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. Journal of Critical Care 62 (2021), 25–30. https://doi.org/10.1016/j.jcrc.2020.10.033

Hoyt Burdick, Carson Lam, Samson Mataraso, Anna Siefkas, Gregory Braden, R. Phillip Dellinger, Andrea McCoy, Jean-Louis Vincent, Abigail Green-Saxena, Gina Barnes, Jana Hoffman, Jacob Calvert, Emily Pellegrini, and Ritankar Das. 2020. Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial. Computers in Biology and Medicine 124 (2020), 103949. https://doi.org/10.1016/j.compbiomed.2020.103949

Nahúm Cueto-López, Maria Teresa García-Ordás, Verónica Dávila-Batista, Víctor Moreno, Nuria Aragonés, and Rocío Alaiz-Rodríguez. 2019. A comparative study on feature selection for a risk prediction model for colorectal cancer. Computer Methods and Programs in Biomedicine 177 (2019), 219–229. https://doi.org/10.1016/j.cmpb.2019.06.001

João Paulo da Silva Funchal and Diana Francisca Adanatti. 2016. Um Estudo Sobre a Classificação de Risco na Área da Saúde Utilizando Árvores de Decisão. iSys – Revista Brasileira de Sistemas de Informação 9, 3 (2016), 9–111. https://doi.org/10.5753/isys.2016.317

Gazeta do Povo. 2020. Covid-19 circula entre humanos há um ano. Relembre os principais momentos da crise. Retrieved December 5, 2020 from https://www.gazetadopovo.com.br/mundo/coronavirus-um-ano-linha-tempo/.

Wesllei Felipe Heckler, Juliano Varella de Carvalho, Cássia Cinara da Costa, and Jorge Luis Victória Barbosa. 2020. A Model to Patient Abandonment Prediction in the Pulmonary Rehabilitation. In XVI Brazilian Symposium on Information Systems (São Bernardo do Campo, Brazil) (SBSI’20). Association for Computing New York, NY, USA, Article 32, 8 pages. https://doi.org/10.1145/3411564.3411642 Machinery,

Portal G1. 2020. Análise de perfil de 44 mil pacientes com coronavírus mostra que 80% dos casos são leves. Retrieved December 10, 2020 from [link].

Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data mining: concepts and techniques(3rd ed.). Morgan Kaufmann, Waltham. Digital Library

Ricardo Linden. 2008. Algoritmos Genéticos(2nd ed.). Brasport, Rio de Janeiro.

Chip M. Lynch, Behnaz Abdollahi, Joshua D. Fuqua, Alexandra R. de Carlo, James A. Bartholomai, Rayeanne N. Balgemann, Victor H. van Berkel, and Hermann B. Frieboes. 2017. Prediction of lung cancer patient survival via supervised machine learning classification techniques. International Journal of Medical Informatics 108 (2017), 1–8. https://doi.org/10.1016/j.ijmedinf.2017.09.013

Negar Maleki, Yasser Zeinali, and Seyed Taghi Akhavan Niaki. 2021. A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications 164 (2021), 113981. https://doi.org/10.1016/j.eswa.2020.113981

Flávio Monteiro, Fernando Meloni, José Augusto Baranauskas, and Alessandra Alaniz Macedo. 2020. Prediction of mortality in Intensive Care Units: a multivariate feature selection. Journal of Biomedical Informatics 107 (2020), 103456. https://doi.org/10.1016/j.jbi.2020.103456

Organização Pan-Americana da Saúde. 2020. Ficha Informativa COVID-19: A COVID-19 e o papel dos sistemas de informação e das tecnologias na atenção primária. Retrieved December 4, 2020 from https://iris.paho.org/handle/10665.2/52206

Alberto Palacios Pawlovsky and Hiroki Matsuhashi. 2017. The use of a novel genetic algorithm in component selection for a kNN method for breast cancer prognosis. In 2017 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges (GMEPE/PAHCE). IEEE, Tuxtla Gutierrez, Mexico, 1–5. https://doi.org/10.1109/GMEPE-PAHCE.2017.7972084

K R Pradeep and N C Naveen. 2018. Lung Cancer Survivability Prediction based on Performance Using Classification Techniques of Support Vector Machines, C4.5 and Naive Bayes Algorithms for Healthcare Analytics. Procedia Computer Science 132 (2018), 412–420. https://doi.org/10.1016/j.procs.2018.05.162

Scikit-learn. 2020. Feature selection. Retrieved December 3, 2020 from https://scikit-learn.org/stable/modules/feature_selection.html

Secretaria Estadual da Saúde do Rio Grande do Sul. 2020. Painel Coronavírus RS. Retrieved December 3, 2020 from https://ti.saude.rs.gov.br/covid19/sobre

Yu Zhou, Wenjun Zhang, Junhao Kang, Xiao Zhang, and Xu Wang. 2021. A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Information Sciences 547(2021), 841–859. https://doi.org/10.1016/j.ins.2020.08.083
Publicado
07/06/2021
COLPO, Miriam Pizzato; ALVES, Bruno Cascaes; PEREIRA, Kevin Soares; BRANDÃO, Anna Flávia Zimmermann; AGUIAR, Marilton Sanchonete de; PRIMO, Tiago Thompsen. Attribute selection based on genetic and classification algorithms in the prediction of hospitalization need of COVID-19 patients. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 17. , 2021, Uberlândia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 .

Artigos mais lidos do(s) mesmo(s) autor(es)