Statistical analysis of small twitter data collection to identify dengue outbreaks

  • Carlos Euzebio USP
  • Sidney Agy USP
  • Claudio Boldorini Jr. USP
  • Lucas Porto USP
  • José Renato Alcarás USP
  • Alexandre Martinez USP
  • Evandro Ruiz USP


This study presents an algorithmic strategy to analyze a small set of social network information to monitor the dengue disease. Previous studies have achieved similar results based on large datasets of Twitter microblogs. In this study, we successfully map dengue cases using a small data collection of tweets from a medium-size city. A set of modules were constructed to collect, categorize, and display dengue-related tweets. We compared the collected tweets with real data from confirmed dengue cases. We showed a significant correlation between the number of confirmed dengue cases
and the number of dengue-related tweets, even considering such a small dataset. The results of this approach may be relevant in public health policies.

Palavras-chave: Aedes aegypti, Dengue, Social Network, Public health


Chew, C. and Eysenbach, G. Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PLOS ONE 5 (11): e14118, 2010.

de Almeida Marques-Toledo, C., Degener, C. M., Vinhal, L., Coelho, G., Meira, W., Codec¸o, C. T., and Teixeira, M. M. Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting dengue at country and city level. PLOS Neglected Tropical Diseases 11 (7): e0005729, 2017.

Finch, K. C., Snook, K. R., Duke, C. H., Fu, K. W., Tse, Z. T. H., Adhikari, A., and Fung, I. C. H. Public health implications of social media use during natural disasters, environmental disasters, and other environmental concerns. Natural Hazards 83 (1): 729–760, 2016.

Fu, K.-W., Liang, H., Saroha, N., Tse, Z. T. H., Ip, P., and Fung, I. C.-H. How people react to Zika virus outbreaks on Twitter? A computational content analysis. American Journal of Infection Control 44 (12): 1700–1702, 2016.

Gomide, J., Veloso, A., Meira, W., Almeida, V., Benevenuto, F., Ferraz, F., and Teixeira, M. Dengue surveillance based on a computational model of spatio-temporal locality of twitter. In Proceedings of the 3rd International Web Science Conference. WebSci ’11. Association for Computing Machinery, New York, NY, USA, 2011.

Machado, M., Temporal, J. C., Pardo, T. A., and Ruiz, E. E. Minera¸c˜ao de t´opicos e aspectos em microblogs sobre dengue, chikungunya, zika e microcefalia. In Anais Principais do XVII Workshop de Inform´atica M´edica. SBC, Porto Alegre, RS, Brasil, 2017.

McGough, S. F., Brownstein, J. S., Hawkins, J. B., and Santillana, M. Forecasting Zika incidence in the 2016 Latin America outbreak combining traditional disease surveillance with search, social media, and news report data. PLOS Neglected Tropical Diseases 11 (1): e0005295, 2017.

Mousset, P., Pitarch, Y., and Tamine, L. Studying the Spatio-Temporal Dynamics of Small-Scale Events in Twitter. In Proceedings of the 29th on Hypertext and Social Media. pp. 73–81, 2018.

Park, H. W., Park, S., and Chong, M. Conversations and Medical News Frames on Twitter: Infodemiological Study on COVID-19 in South Korea. Journal of Medical Internet Research 22 (5): e18897, May, 2020.

Petersen, E., Wilson, M. E., Touch, S., McCloskey, B., Mwaba, P., Bates, M., Dar, O., Mattes, F., Kidd, M., Ippolito, G., et al. Rapid spread of Zika virus in the Americas-implications for public health preparedness for mass gatherings at the 2016 Brazil Olympic Games. International Journal of Infectious Diseases vol. 44, pp. 11–15, 2016.

Santillana, M., Nguyen, A. T., Dredze, M., Paul, M. J., Nsoesie, E. O., and Brownstein, J. S. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLOS Computational Biology 11 (10): 1–15, 10, 2015.

Signorini, A., Segre, A. M., and Polgreen, P. M. The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLOS ONE 6 (5): e19467, 2011.

Wang, Y., Xu, K., Kang, Y., Wang, H., Wang, F., and Avram, A. Regional influenza prediction with sampling twitter data and PDE model. International Journal of Environmental Research and Public Health 17 (3): 678, 2020.

Zhang, Q., Sun, K., Chinazzi, M., y Piontti, A. P., Dean, N. E., Rojas, D. P., Merler, S., Mistry, D., Poletti, P., Rossi, L., et al. Spread of Zika virus in the Americas. Proceedings of the National Academy of Sciences 114 (22): E4334–E4343, 2017.
EUZEBIO, Carlos; AGY, Sidney; BOLDORINI JR., Claudio; PORTO, Lucas; ALCARÁS, José Renato; MARTINEZ, Alexandre; RUIZ, Evandro. Statistical analysis of small twitter data collection to identify dengue outbreaks. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 8. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 17-24. ISSN 2763-8944. DOI: