Covid Data Analytics Repository: An interdisciplinary look into the COVID-19 pandemic in Brazil

Authors

  • Ramon A. S. Franco Universidade Federal de Minas Gerais / Universidade Federal do Oeste da Bahia
  • Pedro Loures Alzamora Universidade Federal de Minas Gerais
  • Janaína Guiginski Universidade Federal de Minas Gerais
  • Evandro L. T. P. Cunha Universidade Federal de Minas Gerais
  • Tereza Bernardes Universidade Federal de Minas Gerais
  • Juan F. Galindo Univesidade Estadual de Campinas
  • Luana Passos Universidade Federal de Minas Gerais / Universidade Federal do Oeste da Bahia
  • Raquel Schneider Universidade Federal de Minas Gerais
  • Bruno Chagas Universidade Federal de Minas Gerais
  • Kícila Ferreguetti Universidade Federal de Minas Gerais
  • Luísa Cardoso Universidade Federal de Minas Gerais
  • Pedro Moreira Universidade Federal de Minas Gerais
  • Wallace Pereira Universidade Federal de Minas Gerais
  • Ana Paula Couto da Silva Universidade Federal de Minas Gerais
  • Wagner Meira Jr. Universidade Federal de Minas Gerais

DOI:

https://doi.org/10.5753/jidm.2022.2266

Keywords:

coronavirus, datasets, digital health, social networks

Abstract

This article describes the construction and deployment of the Covid Data Analytics Repository, a source for interdisciplinary studies about the impact of the COVID-19 pandemic in Brazil. We collected different types of data from official (IBGE, DATASUS) and non-official (Brasil.IO) sources, online social networks (Instagram, Twitter), and from a search engine analysis tool (Google Trends). We used these data to perform investigations aimed to understand the impacts of COVID-19 in the country, from economics to social behavior. At the moment of publication of this article, our repository contains 1,508 documents, classified into two main types: (i) databases and tables downloaded from the aforementioned sources; and (ii) papers, reports, maps and graphs resulting from the analyses that we performed. As a means to allow reproducibility and foster follow-up studies, we released our repository for public use.

Downloads

Download data is not yet available.

Author Biography

Evandro L. T. P. Cunha, Universidade Federal de Minas Gerais

Masters in progress at Federal University of Minas Gerais, in the field of Computational Sciences. Undergraduate at Letters/Linguistics at the same institution. Has experience in Linguistics, acting on the following subjects: Computational Linguistics, Quantitative Sociolinguistics, Romance Philology, Textual Criticism, Complex Networks and OSN (Online Social Networks).

References

Aiello, A. E., Renson, A., and Zivich, P. N. Social media– and internet-based disease surveillance for public health. Annual Review of Public Health 41 (1): 101–118, 2020.

Bastos, S. B. and Cajueiro, D. O. Modeling and forecasting the early evolution of the Covid-19 pandemic in Brazil. Scientific Reports 10 (1): 19457, 2020.

Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. the Journal of machine Learning research vol. 3, pp. 993–1022, 2003.

Box, G. E., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. Time series analysis: forecasting and control. John Wiley & Sons, Washington, USA, 2015.

Brodeur, A., Clark, A. E., Fleche, S., and Powdthavee, N. Covid-19, lockdowns and well-being: Evidence from google trends. Journal of public economics vol. 193, pp. 104346, 2021.

Brum, P. V., Teixeira, M. C., Miranda, R., Vimieiro, R., Meira Jr, W., and Pappa, G. L. A characterization of portuguese tweets regarding the covid-19 pandemic. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, SBC, Online, October 4-8, 20210, pp. 177–184, 2020.

Chen, E., Lerman, K., and Ferrara, E. Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JMIR Public Health and Surveillance 6 (2): e19273, 2020.

Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., and Scala, A. The COVID-19 social media infodemic. Scientific Reports 10 (1): 16598, 2020.

Cota, W. Monitoring the number of COVID-19 cases and deaths in Brazil at municipal and federative units level. SciELO Preprints 20 (x): 1–13, 2020.

Cunha, E. L. T. P., Magno, G., Gonçalves, M. A., Cambraia, C. N., and Almeida, V. He votes or she votes? Female and male discursive strategies in Twitter political hashtags. PLOS ONE 9 (1): e87041, Jan., 2014.

Dong, E., Du, H., and Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases 20 (5): 533–534, 2020.

Du, J., Xu, J., Song, H., Liu, X., and Tao, C. Optimization on machine learning based approaches for sentiment analysis on hpv vaccines related tweets. Journal of biomedical semantics 8 (1): 1–7, 2017.

Guimarães, R. B., CatÃO, R. D. C., MARTINUCI, O. D. S., Pugliesi, E. A., and Matsumoto, P. S. S. O raciocínio geográfico e as chaves de leitura da covid-19 no território brasileiro. Estudos avançados vol. 34, pp. 119–140, 2020.

Kang, G. J., Ewing-Nelson, S. R., Mackey, L., Schlitt, J. T., Marathe, A., Abbas, K. M., and Swarup, S. Semantic network analysis of vaccine sentiment in online social media. Vaccine 35 (29): 3621–3638, 2017.

Li, C., Chen, L. J., Chen, X., Zhang, M., Pang, C. P., and Chen, H. Retrospective analysis of the possibility of predicting the covid-19 outbreak from internet searches and social media data, china, 2020. Eurosurveillance 25 (10): 10, 2020.

Locatelli, M. S. et al. Correlations between web searches and COVID-19 epidemiological indicators in Brazil. Brazilian Archives of Biology and Technology 65 (x): 00–7, 2022.

Marques-Toledo, C. d. A., Degener, C. M., Vinhal, L., Coelho, G., Meira, W., Codeço, C. T., and Teixeira, M. M. Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting dengue at country and city level. PLoS neglected tropical diseases 11 (7): e0005729, 2017.

Martins, A. D. F., Cabral, L., Mourão, P. J. C., de Sá, I. C., Monteiro, J. M., and Machado, J. COVID19.BR: a dataset of misinformation about COVID-19 in Brazilian Portuguese WhatsApp messages. In III Dataset Showcase Workshop (DSW). SBC, Online, October 4-8, 2021, pp. 138–147, 2021.

Mavragani, A. and Gkillas, K. Covid-19 predictability in the united states using google trends time series. Scientific reports 10 (1): 1–12, 2020.

Miller, M. 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository: Johns Hopkins University Center for Systems Science and Engineering. Bulletin - Association of Canadian Map Libraries and Archives (ACMLA) 164 (2020): 47–51, 2020.

Moreira, P., Fonseca, R., Alzamora, P. L., Franco, R. A. S., Guiginski, J., Cunha, E. L. T. P., Bernardes, T., Chagas, B., Ferreguetti, K., Passos, L., Cardoso, L., Schneider, R., Pereira, W., da Silva, A. P. C., and Meira Jr., W. Covid Data Analytics: repositório de dados provenientes de múltiplas fontes sobre a pandemia de COVID-19 no Brasil. In III Dataset Showcase Workshop (DSW). Vol. 03. SBC, Online, October 4-8, 2021, pp. 107–116, 2021.

Myers, L. and Sirois, M. J. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences vol. 12, pp. 138–147, 2004.

Pankratz, A. Forecasting with dynamic regression models. Vol. 935. John Wiley & Sons, Washington, USA, 2012.

Peixoto, P. S., Marcondes, D., Peixoto, C., and Oliva, S. M. Modeling future spread of infections via mobile geolocation data and population dynamics. An application to COVID-19 in Brazil. PLOS ONE 15 (7): e0235732, 2020.

Pereira, I. G., Guerin, J. M., Silva Júnior, A. G., Garcia, G. S., Piscitelli, P., Miani, A., Distante, C., and Gonçalves, L. M. G. Forecasting Covid-19 dynamics in Brazil: a data driven approach. International Journal of Environmental Research and Public Health 17 (14): 5115, 2020.

Ranzani, O. T., Bastos, L. S., Gelli, J. G. M., Marchesi, J. F., Baião, F., Hamacher, S., and Bozza, F. A. Characterisation of the first 250 000 hospital admissions for COVID-19 in Brazil: a retrospective analysis of nationwide data. The Lancet Respiratory Medicine 9 (4): 407–418, 2021.

Rey S. J., Arribas-Bel D., W. L. J. Geographic data science with pysal and the pydata stack, 2020.

Riffe, T. et al. Data resource profile: COVerAGE-DB: a global demographic database of COVID-19 cases and deaths. International Journal of Epidemiology 50 (2): 390–390f, 2021.

Silva, R. J., Silva, K., Mattos, J., et al. Análise espacial sobre a dispersão da covid-19 no estado da bahia. SciELO Preprints vol. 15, pp. 1–10, 2020.

Sultana, A., Tasnim, S., Hossain, M. M., Bhattacharya, S., and Purohit, N. Digital screen time during the covid-19 pandemic: a public health concern. F1000Research 10 (81): 81, 2021.

Szwarcwald, C. L., Bastos, F. I., Esteves, M. A. P., and Andrade, C. L. A disseminação da epidemia da aids no brasil, no período de 1987-1996: uma análise espacial. Cadernos de Saúde Pública vol. 16, pp. S07–S19, 2000.

Veiga e Silva, L., de Andrade Abi Harb, M. d. P., dos Santos, A. M. T. B., de Mattos Teixeira, C. A., Gomes, V. H. M., Cardoso, E. H. S., da Silva, M. S., Vijaykumar, N. L., Carvalho, S. V., Ponce de Leon Ferreira de Carvalho, A., and Frances, C. R. L. COVID-19 mortality underreporting in Brazil: analysis of data from government internet portals. Journal of Medical Internet Research 22 (8): e21413, 2020.

Weisberg, S. Applied linear regression. Vol. 528. John Wiley & Sons, Washington, USA, 2005.

Zarei, K., Farahbakhsh, R., Crespi, N., and Tyson, G. A first Instagram dataset on COVID-19. arXiv preprint: 2004.12226 10 (x): 0–13, 2020.

Downloads

Published

2022-08-15

How to Cite

A. S. Franco, R., Loures Alzamora, P., Guiginski, J., L. T. P. Cunha, E., Bernardes, T., Galindo, J. F., Passos, L., Schneider, R., Chagas, B., Ferreguetti, K., Cardoso, L., Moreira, P., Pereira, W., Couto da Silva, A. P., & Meira Jr., W. (2022). Covid Data Analytics Repository: An interdisciplinary look into the COVID-19 pandemic in Brazil. Journal of Information and Data Management, 13(1). https://doi.org/10.5753/jidm.2022.2266

Issue

Section

Dataset Showcase Workshop 2021 - Extended Papers