A Comparison between Machine Learning-based Methods to Infer Weekly Dengue Case Numbers
Abstract
Arboviruses transmitted by Aedes aegypti and Aedes albopictus are among the leading public health problems, with dengue being the most prominent. Managing dengue epidemics requires advanced preparation; thus, predicting the cases in a specific region can assist in prevention strategies and control the epidemic process. With this in view, this study evaluates the efficiency of classic statistical techniques and machine learning methods in predicting dengue cases from geographic data of San Juan, Puerto Rico. For this, we selected features using the cross-correlation matrix with the total number of weekly dengue cases, which were subsequently filtered by wavelet transformations. The Linear Regression model, using precipitation levels and vegetation filtered by the symmlet wavelet (sym20), showed the best performance on the metrics MAE, R2, MAPE, RMSE, and BIAS.
References
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Buczak, A. L., Baugher, B., Moniz, L. J., Bagley, T., Babin, S. M., & Guven, E. (2018). Ensemble method for dengue prediction. PLOS ONE, 13(1):e0189988.
Cabrera, M., Leake, J., Naranjo-Torres, J., Valero, N., Cabrera, J. C., & Rodríguez-Morales, A. J. (2022). Dengue prediction in latin america using machine learning and the one health perspective: A literature review. Tropical Medicine and Infectious Disease, 7(10).
Celentano, D. D., Sifakis, F., Go, V., & Davis, W. (2008). Changing sexual mores and disease transmission. In The Social Ecology of Infectious Diseases, pages 50–76. Elsevier.
Chen, T. & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM.
da Silva, L. J. & Angerami, R. N. (2008). Viroses emergentes no Brasil. Editora Fiocruz.
Derrick, T. & Thomas, J. (2004). Time-Series Analysis: The Cross-Correlation Function, pages 189–205. Human Kinetics Publishers, Champaign, Illinois. Posted with permission.
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. In Proceedings of the 9th International Conference on Neural Information Processing Systems, page 155–161, Cambridge, MA, USA. MIT Press.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5).
Fuller, W. A. (1976). Introduction to statistical time series. Probability & Mathematical Statistics S. John Wiley & Sons, Nashville, TN.
Guo, P., Liu, T., & Zhang, Q. e. a. (2017). Developing a dengue forecast model using machine learning: a case study in china. PLoS Negl. Trop. Dis., 11(10).
James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). An introduction to statistical learning. Springer International Publishing, Cham, Switzerland, 1 edition.
Kreyszig, E. (2010). Advanced Engineering Mathematics 10E. John Wiley & Sons, Chichester, England.
Lai, G., Chang, W., Yang, Y., & Liu, H. (2017). Modeling longand short-term temporal patterns with deep neural networks. CoRR, abs/1703.07015.
Lee, G. R., Gommers, R., Waselewski, F., Wohlfahrt, K., & Leary, A. (2019). Pywavelets: A python package for wavelet analysis. Journal of Open Source Software, 4(36):1237.
Lopes, N., Nozawa, C., & Linhares, R. E. C. (2014). Características gerais e epidemiologia dos arbovírus emergentes no brasil. Revista Pan-Amazônica de Saúde, 5(3).
Morettin, P. A. & Toloi, C. M. (2018). Análise de séries temporais. Blucher.
Panja, M., Chakraborty, T., Nadim, S. S., Ghosh, I., Kumar, U., & Liu, N. (2023). An ensemble neural network approach to forecast dengue outbreak based on climatic condition. Chaos, Solitons & Fractals, 167:113124.
Ross, T. M. (2010). Dengue virus. Clinics in Laboratory Medicine, 30(1):149–160.
San Martin, J., Solorzano, J., & Guzman, M. e. a. (2010). The epidemiology of dengue in the americas over the last three decades: a worrisome reality. Am. J. Trop. Med. Hyg., 82(1):128–135.
Santos, C. A. G., Guerra-Gomes, I. C., Gois, B. M., Peixoto, R. F., Keesen, T. S. L., & da Silva, R. M. (2019). Correlation of dengue incidence and rainfall occurrence using wavelet transform for joão pessoa city. Science of The Total Environment, 647:794–805.
Seabold, S. & Perktold, J. (2010). statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference.
Shaikh, M. S. G., SureshKumar, D. B., & Narang, D. (2023). Development of optimized ensemble classifier for dengue fever prediction and recommendation system. Biomedical Signal Processing and Control, 85:104809.
Strang, G. & Nguyen, T. (1996). Wavelets and filter banks. Wellesley-Cambridge Press, Wellesley, MA, 2 edition.
US National Oceanic and Atmospheric Administration (2017). Dengue forecasting project website. Acessado em 21 de fevereiro de 2024.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention is all you need.
World Health Organization (2009). Dengue: Guidelines for diagnosis, treatment, prevention and control. World Health Organization, Genève, Switzerland.
World Health Organization (2023). Dengue global situation. [Online; accessed 12-29-2023].
