Comparison of Machine Learning Models for Total Dengue Cases Prediction
Abstract
Dengue is an endemic disease with high prevalence in tropical areas, due to transmission by mosquitoes. Through preprocessing methods and machine learning algorithms, this work aims to develop predictive models for total dengue cases using climatic variables, as part of the ’DengAI-predicting disease spread’ competition, hosted by DrivenData. Among all algorithms implemented, the Ensemble method, using Random Forest and Neural network, outperformed the proposed Benchmark, improving the results by 4.5%.
References
Bennett, D. A. (2001). How can i deal with missing data in my study? Australian and New Zealand journal of public health, 25(5):464–469.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chua, M., Deb, S., and Acebedo, C. M. (2017). An ensemble prediction approach to weekly dengue cases forecasting based on climatic and terrain conditions. Journal of Health and Social Sciences, 2:257–272.
Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, (4):325–327.
Focks, D. A., Daniels, E., Haile, D. G., and Keesling, J. E. (1995). A simulation model of the epidemiology of urban dengue fever: literature analysis, model development, preliminary validation, and samples of simulation results. The American journal of tropical medicine and hygiene, 53(5):489–506.
Fuller, D. O., Troyo, A., and Beier, J. C. (2009). El nino southern oscillation and vegetation dynamics as predictors of dengue fever cases in costa rica. Environmental Research Letters, 4(1):014011.
Gardner, M. W. and Dorling, S. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15):2627–2636.
Gers, F. A., Schmidhuber, J., and Cummins, F. (1999). Learning to forget: Continual prediction with lstm.
Granitto, P. M., Furlanello, C., Biasioli, F., and Gasperi, F. (2006). Recursive feature elimination with random forest for ptr-ms analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2):83–90.
Gubler, D. J., Reiter, P., Ebi, K. L., Yap, W., Nasci, R., and Patz, J. A. (2001). Climate variability and change in the united states: potential impacts on vector-and rodent-borne diseases. Environmental health perspectives, 109(suppl 2):223–233.
Halide, H. and Ridd, P. (2008). A predictive model for dengue hemorrhagic fever epidemics. International journal of environmental health research, 18(4):253–265.
Kuno, G. (1997). Factors influencing the transmission of dengue viruses. Dengue and dengue hemorrhagic fever, 1:23–39.
Kwon, Y.-S., Bae, M.-J., Chung, N., Lee, Y.-R., Hwang, S., Kim, S., Choi, Y., and Park, Y.-S. (2015). Modeling occurrence of urban mosquitos based on land use types and meteorological factors in korea. International journal of environmental research and public health, 12(10):13131–13147.
Lambrechts, L., Paaijmans, K. P., Fansiri, T., Carrington, L. B., Kramer, L. D., Thomas, M. B., and Scott, T. W. (2011). Impact of daily temperature fluctuations on dengue virus transmission by aedes aegypti. Proceedings of the National Academy of Sciences, 108(18):7460–7465.
Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2):431–441.
Mittelman, R. (2015). Time-series modeling with undecimated fully convolutional neural networks. arXiv preprint arXiv:1508.00317.
Robnik-Šikonja, M. (2004). Improving random forests. In European conference on machine learning, pages 359–370. Springer.
Rodhain, F. R. (1997). Mosquito vectors and dengue virus-vector relationships. Dengue and dengue hemorrhagic fever, pages 45–60.
Sathler, C. and Luciano, J. (2017). Predictive modeling of dengue fever epidemics: A neural network approach.
Scavuzzo, J. M., Trucco, F., Espinosa, M., Tauro, C. B., Abril, M., Scavuzzo, C. M., and Frery, A. C. (2018). Modeling dengue vector population using remotely sensed data and machine learning. Acta tropica, 185:167–175.
Shi, Q., Abdel-Aty, M., and Lee, J. (2016). A bayesian ridge regression analysis of congestion’s impact on urban expressway safety. Accident Analysis & Prevention, 88:124–137.
