Comparison of Machine Learning Models for Total Dengue Cases Prediction

  • Thiago Carvalho Pontifícia Universidade Católica do Rio de Janeiro
  • Gabriel Tenório Pontifícia Universidade Católica do Rio de Janeiro
  • Karla Figueiredo Universidade do Estado do Rio de Janeiro
  • Marley Vellasco Pontifícia Universidade Católica do Rio de Janeiro
  • Wouter Caarls Pontifícia Universidade Católica do Rio de Janeiro

Abstract


Dengue is an endemic disease with high prevalence in tropical areas, due to transmission by mosquitoes. Through preprocessing methods and machine learning algorithms, this work aims to develop predictive models for total dengue cases using climatic variables, as part of the ’DengAI-predicting disease spread’ competition, hosted by DrivenData. Among all algorithms implemented, the Ensemble method, using Random Forest and Neural network, outperformed the proposed Benchmark, improving the results by 4.5%.

Keywords: Data Mining, Machine Learning, Dengue, Regression, Artificial Neural Networks

References

Benesty, J., Chen, J., Huang, Y., and Cohen, I. (2009). Pearson correlation coefficient. In Noise reduction in speech processing, pages 1–4. Springer.

Bennett, D. A. (2001). How can i deal with missing data in my study? Australian and New Zealand journal of public health, 25(5):464–469.

Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.

Chua, M., Deb, S., and Acebedo, C. M. (2017). An ensemble prediction approach to weekly dengue cases forecasting based on climatic and terrain conditions. Journal of Health and Social Sciences, 2:257–272.

Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, (4):325–327.

Focks, D. A., Daniels, E., Haile, D. G., and Keesling, J. E. (1995). A simulation model of the epidemiology of urban dengue fever: literature analysis, model development, preliminary validation, and samples of simulation results. The American journal of tropical medicine and hygiene, 53(5):489–506.

Fuller, D. O., Troyo, A., and Beier, J. C. (2009). El nino southern oscillation and vegetation dynamics as predictors of dengue fever cases in costa rica. Environmental Research Letters, 4(1):014011.

Gardner, M. W. and Dorling, S. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15):2627–2636.

Gers, F. A., Schmidhuber, J., and Cummins, F. (1999). Learning to forget: Continual prediction with lstm.

Granitto, P. M., Furlanello, C., Biasioli, F., and Gasperi, F. (2006). Recursive feature elimination with random forest for ptr-ms analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2):83–90.

Gubler, D. J., Reiter, P., Ebi, K. L., Yap, W., Nasci, R., and Patz, J. A. (2001). Climate variability and change in the united states: potential impacts on vector-and rodent-borne diseases. Environmental health perspectives, 109(suppl 2):223–233.

Halide, H. and Ridd, P. (2008). A predictive model for dengue hemorrhagic fever epidemics. International journal of environmental health research, 18(4):253–265.

Kuno, G. (1997). Factors influencing the transmission of dengue viruses. Dengue and dengue hemorrhagic fever, 1:23–39.

Kwon, Y.-S., Bae, M.-J., Chung, N., Lee, Y.-R., Hwang, S., Kim, S., Choi, Y., and Park, Y.-S. (2015). Modeling occurrence of urban mosquitos based on land use types and meteorological factors in korea. International journal of environmental research and public health, 12(10):13131–13147.

Lambrechts, L., Paaijmans, K. P., Fansiri, T., Carrington, L. B., Kramer, L. D., Thomas, M. B., and Scott, T. W. (2011). Impact of daily temperature fluctuations on dengue virus transmission by aedes aegypti. Proceedings of the National Academy of Sciences, 108(18):7460–7465.

Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2):431–441.

Mittelman, R. (2015). Time-series modeling with undecimated fully convolutional neural networks. arXiv preprint arXiv:1508.00317.

Robnik-Šikonja, M. (2004). Improving random forests. In European conference on machine learning, pages 359–370. Springer.

Rodhain, F. R. (1997). Mosquito vectors and dengue virus-vector relationships. Dengue and dengue hemorrhagic fever, pages 45–60.

Sathler, C. and Luciano, J. (2017). Predictive modeling of dengue fever epidemics: A neural network approach.

Scavuzzo, J. M., Trucco, F., Espinosa, M., Tauro, C. B., Abril, M., Scavuzzo, C. M., and Frery, A. C. (2018). Modeling dengue vector population using remotely sensed data and machine learning. Acta tropica, 185:167–175.

Shi, Q., Abdel-Aty, M., and Lee, J. (2016). A bayesian ridge regression analysis of congestion’s impact on urban expressway safety. Accident Analysis & Prevention, 88:124–137.
Published
2019-10-15
CARVALHO, Thiago; TENÓRIO, Gabriel; FIGUEIREDO, Karla; VELLASCO, Marley; CAARLS, Wouter. Comparison of Machine Learning Models for Total Dengue Cases Prediction. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 658-669. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9323.

Most read articles by the same author(s)