Analysis and Prediction of Childhood Pneumonia Deaths using Machine Learning Algorithms


Acute Respiratory Tract Infections are among the leading causes of child mortality worldwide. Specifically, community-acquired pneumonia has different causes, such as: passive smoking, air pollution, poor hygiene, cardiac insufficiency, oropharyngeal colonization, nutritional deficiency, immunosuppression, and environmental, economic and social factors. Due to the variation of these causes, knowledge discovery in this area of health has been a great challenge for researchers. Thus, this paper presents the steps for the construction of a database and evaluation results applied to the analysis and prediction of potential deaths caused by childhood pneumonia using the Pictorea method. For this, the Random Forest and Artificial Neural Network algorithms were used, and after comparison, the Neural Network algorithm showed higher accuracy by up to 87.57%. This algorithm was used to analyze and predict the number of deaths from pneumonia in children up to 5 years old, and the results were presented using Root Mean Square Error and scatter plots. A domain specialist validated the results and defined that the pattern found is relevant for future studies in the medical field, helping to analyze the behavior of countries and predict future scenarios.

Palavras-chave: Artificial neural network, Pneumonia, Data analysis and prediction, Potential deaths, Random forest


Afifi, W. N. W. M., Warsito, I. F., Sayahkarajy, M., and Supriyanto, E. The development of an online pneumonia risk prediction system. In 2017 International Conference on Robotics, Automation and Sciences (ICORAS). pp. 1–5, 2017.

Alimadadi, A., Aryal, S., Manandhar, I., Munroe, P. B., Joe, B., and Cheng, X. Artificial intelligence and machine learning to fight covid-19. Physiological Genomics 52 (4): 200–202, 2020.

Apostolopoulos, I. D. and Mpesiana, T. A. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med, 2020.

Breiman, L. Random forests. Machine Learning 45 (1): 5–32, Oct, 2001.

Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., and Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15. ACM, New York, NY, USA, pp. 1721–1730, 2015.

Chaves, L. E., Nascimento, L. F. C., and Rizol, P. M. S. R. Modelo fuzzy para estimar o número de internações por asma e pneumonia sob os efeitos da poluição do ar. Revista de Saúde Pública vol. 51, pp. 1–8, 2017.

COVID-19. Open research dataset (cord-19), 2020. Accessed: 2020-04-05.

Duan, Z., Han, X., Bai, Z., and Yuan, Y. Fine particulate air pollution and hospitalization for pneumonia: a case-crossover study in shijiazhuang, china. Air Quality, Atmosphere & Health 9 (7): 723–733, 2016.

Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. From data mining to knowledge discovery in databases. AI magazine 17 (3): 37, 1996.

Jothi, N., Rashid, N. A., and Husain, W. Data mining in healthcare – a review. Procedia Computer Science vol. 72, pp. 306 – 313, 2015. The Third Information Systems International Conference 2015.

Kraft, M. R., Desouza, K. C., and Androwich, I. Data mining in healthcare information systems: case study of a veterans’ administration spinal cord injury population. In 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the. pp. 9 pp.–, 2003.

Laiakis, E., Morris, G., Fornace, A., and Howie, S. Metabolomic analysis in severe childhood pneumonia in the gambia, west africa: findings from a pilot study. In PLoS One. Vol. 5, 2010.

Liu, L., Oza, S., Hogan, D., Chu, Y., Perin, J., Zhu, J., Lawn, J. E., Cousens, S., Mathers, C., and Black, R. E. Global, regional, and national causes of under-5 mortality in 2000–15: an updated systematic analysis with implications for the sustainable development goals. The Lancet 388 (10063): 3027–3035, 2017.

Montevecchi, A. and Zárate, L. Pictorea: Um método para descoberta de conhecimento em bancos de dados convencionais. Novas Edições Acadêmicas, United States, 2014.

Nations, U. World economic situation and prospects, 2020. Accessed: 2020-04-20.

Naydenova, E., Tsanas, A., Casals-Pascual, C., and De Vos, M. Smart diagnostic algorithms for automated detection of childhood pneumonia in resource-constrained settings. In 2015 IEEE Global Humanitarian Technology Conference (GHTC). pp. 377–384, 2015.

Organization, W. H. The top 10 causes of death, 2014. Accessed: 2018-01-04.

Scotta, M. C., Marostica, P. J., and Stein, R. T. 25 - pneumonia in children. In Kendig’s Disorders of the Respiratory Tract in Children (Ninth Edition), Ninth Edition ed., R. W. Wilmott, R. Deterding, A. Li, F. Ratjen, P. Sly, H. J. Zar, and A. Bush (Eds.). Content Repository Only, Philadelphia, pp. 427 – 438.e4, 2019.

Willmott, C. J. Some comments on the evaluation of model performance. Bulletin of the American Meteorological Society 63 (11): 1309–1313, 1982.

Yang, J.-J., Li, J., Mulder, J., Wang, Y., Chen, S., Wu, H., Wang, Q., and Pan, H. Emerging information technologies for enhanced healthcare. Computers in Industry vol. 69, pp. 3 – 11, 2015. Special Issue: Information Technologies for Enhanced Healthcare.
Como Citar

Selecione um Formato
SOARES, Felipe A. L.; LOUSADA, Efrem E. O.; SILVEIRA, Tiago B.; MINI, Raquel A. F.; ZÁRATE, Luis E.; FREITAS, Henrique C.. Analysis and Prediction of Childhood Pneumonia Deaths using Machine Learning Algorithms. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 9. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 16-23. ISSN 2763-8944. DOI: