Applying machine learning to assist the diagnosis of COVID-19 from blood and urine exams
Resumo
The COVID-19 pandemic declared in March 2020 by the World Health Organization (WHO) challenged the health system of several countries with the growing number of infected people. During the pandemic's peak in Europe, the low incidence of infection in South Korea drew the international community's attention, since not long ago that country was considered the epicenter of the pandemic outside its origin, in China. The mass testing protocol and tracing policies were pointed out as the formula for South Korean success, however, in view of the high demand and little supply of diagnostic tests for COVID-19 in the market, this strategy proved to be unfeasible to be implemented mainly in countries with large populations and with few financial resources, such as Brazil. There is also the aggravating factor regarding the effectiveness of the tests currently available, especially the rapid serology test with a high rate of false negatives. In order to offer a screening method for the application of tests, this work aims to develop a predictive model for assisting the identification of COVID-19 infection in suspected patients based on data from clinical laboratory examinations, such as blood count and urine tests. The data used comes from three sources in Sao Paulo and are hosted in the COVID-19 Data Sharing/BR Repository, a shared database of Sao Paulo Research Foundation (FAPESP). The work also proposes a comparison between balanced × imbalanced dataset and traditional × ensemble algorithms for this problem.
Referências
Brinati, D., Campagner, A., Ferrari, D., Locatelli, M., Banfi, G., and Cabitza, F. (2020). Detection of covid-19 infection from routine blood exams with machine learning: a feasibility study. Journal of medical systems, 44(8):1–12.
da Saúde, M. (2020). Plano de contingência nacional para infecção humana pelo novo coronavírus covid-19. Technical report, Ministério da Saúde, Brasília, Brasil.
de Freitas Barbosa, V. A., Gomes, J. C., de Santana, M. A., Jeniffer, E. d. A., de Souza, R. G., de Souza, R. E., and dos Santos, W. P. (2021). Heg. ia: An intelligent system to support diagnosis of covid-19 based on blood tests. Research on Biomedical Engineering, pages 1–18.
Delafiori, J., Navarro, L. C., Siciliano, R. F., de Melo, G. C., Busanello, E. N. B., Nicolau, J. C., Sales, G. M., de Oliveira, A. N., Val, F. F. A., de Oliveira, D. N., Eguti, A., dos Santos, L. A., Dalçóquio, T. F., Bertolin, A. J., Alonso, J. C. C., Abreu-Netto, R. L., Salsoso, R., Baía-da Silva, D., Sampaio, V. S., Judice, C. C., Costa, F. M. T., Durán, N., Perroud, M. W., Sabino, E. C., Lacerda, M. V. G., Reis, L. O., Fávaro, W. J., Monteiro, W. M., Rocha, A. R., and Catharino, R. R. (2020). Covid-19 automated diagnosis and risk assessment through metabolomics and machine-learning. medRxiv.
Dighe, A., Cattarino, L., Cuomo-Dannenburg, G., Skarp, J., Imai, N., Bhatia, S., Gaythorpe, K., Ainslie, K., Baguelin, M., Bhatt, S., Boonyasiri, A., Boyd, O., Brazeau, N., Charles, G., Cooper, L., Coupland, H., Cucunubá, Z. M., Djaafara, B., Dorigatti, I., and Riley, S. (2020). Report 25: Response to covid-19 in south korea and implications for lifting stringent interventions.
FAPESP (2020). FAPESP COVID-19 Data Sharing/BR. https://repositoriodatasharingfapesp.uspdigital.usp.br.
Gagliano, M., Pham, J., Tang, B., Kashif, H., and Ban, J. (2017). Applications of machine learning in medical diagnosis.
Government, B. (2020). Ministério da saúde amplia possibilidade de testagem para covid19. Government of Brazil website.
Mello-Román, J., Roman, J., Gomez, S., and Garcia Torres, M. (2019). Predictive models for the medical diagnosis of dengue: A case study in paraguay. Computational and Mathematical Methods in Medicine, 2019:1–7.
Santos, J. (2021). Pre-processed data from fapesp covid-19 data sharing/br available on december 2020. https://github.com/JesssySantos/ENIAC2021.
Silveira, E. C. (2020). Prediction of covid-19 from hemogram results and age using machine learning. Frontiers in Health Informatics, 9(1):39.
Sinha, N. and Balayla, G. (2020). Sequential battery of covid-19 testing to maximize negative predictive value before surgeries. Revista do Colégio Brasileiro de Cirurgiões, 47.
Stephen, O., Sain, M., Maduh, U., and Jeong, D. (2019). An efficient deep learning approach to pneumonia classification in healthcare. Journal of Healthcare Engineering, 2019:1–7.
Sun, G., Matsui, T., Hakozaki, Y., and Abe, S. (2014). An infectious disease/fever screening radar system which stratifies higher-risk patients within ten seconds using a neural network and the fuzzy grouping method. The Journal of infection, 70.
Tem-Caten, F., Gonzalez-Dias, P., Castro, I., Ogava, R., Giddaluru, J., Silva, J., Martins, F., Aquime Gonçalves, A., Costa Martins, A., Araujo, J., Viegas, A., Cunha, F., Farsky, S., Bozza, F., Levin, A., Pannaraj, P., Silva, T., Minoprio, P., Andrade, B., and Nakaya, H. (2020). In-depth analysis of laboratory parameters reveals the interplay between sex, age and systemic inammation in individuals with covid-19.
WHO, W. H. O. (2020a). Coronavirus disease (covid-19) advice for the public. World Health Organization website.
WHO, W. H. O. (2020b). Who coronavirus disease (covid-19) dashboard. World Health Organization website.
Yao, H., Zhang, N., Zhang, R., Duan, M., Xie, T., Pan, J., Peng, E., Huang, J., Zhang, Y., Xu, X., et al. (2020). Severity detection for the coronavirus disease 2019 (covid-19) patients using a machine learning model based on the blood and urine tests. Frontiers in cell and developmental biology, 8:683.