Attribute selection based on genetic and classification algorithms in the prediction of hospitalization need of COVID-19 patients
Resumo
The COVID-19 pandemic has been pressuring the whole society and overloading hospital systems. Machine learning models designed to predict hospitalizations, for example, can contribute to better targeting hospital resources. However, as the excess of information, often irrelevant or redundant, can impair the performance of predictive models, we propose in this work a hybrid approach to attribute selection. This method aims to find an optimal attribute subset through a genetic algorithm, which considers the results of a classification model in its evaluation function to improve the hospitalization need prediction of COVID-19 patients. We evaluated this approach in a database of more than 200 thousand COVID-19 patients from the State Health Secretariat of Rio Grande do Sul. We provided an increase of 18% in the classification precision for patients with hospitalization necessities. In a real-time application, this would also mean greater precision in targeting resources, as well as, consequently and mainly, improved service to the infected population.
Referências
Varun Arvind, Jun S. Kim, Brian H. Cho, Eric Geng, and Samuel K. Cho. 2021. Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. Journal of Critical Care 62 (2021), 25–30. https://doi.org/10.1016/j.jcrc.2020.10.033
Hoyt Burdick, Carson Lam, Samson Mataraso, Anna Siefkas, Gregory Braden, R. Phillip Dellinger, Andrea McCoy, Jean-Louis Vincent, Abigail Green-Saxena, Gina Barnes, Jana Hoffman, Jacob Calvert, Emily Pellegrini, and Ritankar Das. 2020. Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial. Computers in Biology and Medicine 124 (2020), 103949. https://doi.org/10.1016/j.compbiomed.2020.103949
Nahúm Cueto-López, Maria Teresa García-Ordás, Verónica Dávila-Batista, Víctor Moreno, Nuria Aragonés, and Rocío Alaiz-Rodríguez. 2019. A comparative study on feature selection for a risk prediction model for colorectal cancer. Computer Methods and Programs in Biomedicine 177 (2019), 219–229. https://doi.org/10.1016/j.cmpb.2019.06.001
João Paulo da Silva Funchal and Diana Francisca Adanatti. 2016. Um Estudo Sobre a Classificação de Risco na Área da Saúde Utilizando Árvores de Decisão. iSys – Revista Brasileira de Sistemas de Informação 9, 3 (2016), 9–111. https://doi.org/10.5753/isys.2016.317
Gazeta do Povo. 2020. Covid-19 circula entre humanos há um ano. Relembre os principais momentos da crise. Retrieved December 5, 2020 from https://www.gazetadopovo.com.br/mundo/coronavirus-um-ano-linha-tempo/.
Wesllei Felipe Heckler, Juliano Varella de Carvalho, Cássia Cinara da Costa, and Jorge Luis Victória Barbosa. 2020. A Model to Patient Abandonment Prediction in the Pulmonary Rehabilitation. In XVI Brazilian Symposium on Information Systems (São Bernardo do Campo, Brazil) (SBSI’20). Association for Computing New York, NY, USA, Article 32, 8 pages. https://doi.org/10.1145/3411564.3411642 Machinery,
Portal G1. 2020. Análise de perfil de 44 mil pacientes com coronavírus mostra que 80% dos casos são leves. Retrieved December 10, 2020 from [link].
Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data mining: concepts and techniques(3rd ed.). Morgan Kaufmann, Waltham. Digital Library
Ricardo Linden. 2008. Algoritmos Genéticos(2nd ed.). Brasport, Rio de Janeiro.
Chip M. Lynch, Behnaz Abdollahi, Joshua D. Fuqua, Alexandra R. de Carlo, James A. Bartholomai, Rayeanne N. Balgemann, Victor H. van Berkel, and Hermann B. Frieboes. 2017. Prediction of lung cancer patient survival via supervised machine learning classification techniques. International Journal of Medical Informatics 108 (2017), 1–8. https://doi.org/10.1016/j.ijmedinf.2017.09.013
Negar Maleki, Yasser Zeinali, and Seyed Taghi Akhavan Niaki. 2021. A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications 164 (2021), 113981. https://doi.org/10.1016/j.eswa.2020.113981
Flávio Monteiro, Fernando Meloni, José Augusto Baranauskas, and Alessandra Alaniz Macedo. 2020. Prediction of mortality in Intensive Care Units: a multivariate feature selection. Journal of Biomedical Informatics 107 (2020), 103456. https://doi.org/10.1016/j.jbi.2020.103456
Organização Pan-Americana da Saúde. 2020. Ficha Informativa COVID-19: A COVID-19 e o papel dos sistemas de informação e das tecnologias na atenção primária. Retrieved December 4, 2020 from https://iris.paho.org/handle/10665.2/52206
Alberto Palacios Pawlovsky and Hiroki Matsuhashi. 2017. The use of a novel genetic algorithm in component selection for a kNN method for breast cancer prognosis. In 2017 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges (GMEPE/PAHCE). IEEE, Tuxtla Gutierrez, Mexico, 1–5. https://doi.org/10.1109/GMEPE-PAHCE.2017.7972084
K R Pradeep and N C Naveen. 2018. Lung Cancer Survivability Prediction based on Performance Using Classification Techniques of Support Vector Machines, C4.5 and Naive Bayes Algorithms for Healthcare Analytics. Procedia Computer Science 132 (2018), 412–420. https://doi.org/10.1016/j.procs.2018.05.162
Scikit-learn. 2020. Feature selection. Retrieved December 3, 2020 from https://scikit-learn.org/stable/modules/feature_selection.html
Secretaria Estadual da Saúde do Rio Grande do Sul. 2020. Painel Coronavírus RS. Retrieved December 3, 2020 from https://ti.saude.rs.gov.br/covid19/sobre
Yu Zhou, Wenjun Zhang, Junhao Kang, Xiao Zhang, and Xu Wang. 2021. A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Information Sciences 547(2021), 841–859. https://doi.org/10.1016/j.ins.2020.08.083