A Data Lake and Analytics Platform with Application to COVID-19 Dynamic Analysis

  • Francinaldo Almeida Pereira UFRN
  • Júlio Gustavo S. F. Costa UFRN
  • Luiz M. G. Gonçalves UFRN


We propose a platform consisting of a data lake that has been implemented as a web-based service, to specifically solve the Covid-19 data production and processing problem. The main idea is that it can be used by data scientists working on COVID-19-related projects in order to access as much data as possible in one repository and be able not only to analyze that data but also to manage and contribute to new data. Through this platform, it has been possible to dynamically aggregate different data repositories related to the COVID-19 pandemic, in order to provide users, through a web interface, tools for use, transformations, and collaboration of data, as well as analysis and visualization tools integrated to geographic information systems.

Palavras-chave: Covid-19, Data Lake


A. E. Gorbalenya, S. C. Baker, R. S. Baric, R. J. de Groot, C. Drosten, A. A. Gulyaeva, B. L. Haagmans, C. Lauber, A. M. Leontovich, B. W. Neuman, D. Penzar, S. Perlman, L. L. Poon, D. V. Samborskiy, I. A. Sidorov, I. Sola, and J. Ziebuhr, "The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sarscov-2," Nature Microbiology, vol. 5, pp. 536-544, 4 2020.

F. Clement, A. Kaur, M. Sedghi, D. Krishnaswamy, and K. Punithakumar, "Interactive data driven visualization for covid-19 with trends, analytics and forecasting," vol. 2020-September, pp. 593-598, Institute of Electrical and Electronics Engineers Inc., 9 2020.

S. Verma and R. K. Gazara, "Big data analytics for understanding and fighting covid-19," 2021.

I. G. Pereira, J. M. Guerin, A. G. S. Júnior, G. S. Garcia, P. Piscitelli, A. Miani, C. Distante, and L. M. G. Gonçalves, "Forecasting covid-19 dynamics in brazil: A data driven approach," International Journal of Environmental Research and Public Health, vol. 17, pp. 1-26, 7 2020.

N. Chintalapudi, G. Battineni, and F. Amenta, "Covid-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in italy: A data driven model approach," Journal of Microbiology, Immunology and Infection, vol. 53, pp. 396-403, 6 2020.

D. P. Aragão, D. H. dos Santos, A. Mondini, and L. M. G. Gonçalves, "National holidays and social mobility behaviors: Alternatives for forecasting covid-19 deaths in brazil," International Journal of Environmental Research and Public Health, vol. 18, p. 11595, 11 2021.

S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, "Software engineering for machine learning: A case study," pp. 291-300, IEEE, 5 2019.

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, "Hidden technical debt in machine learning systems," vol. 28, Curran Associates, Inc., 2015.

Ishwarappa and J. Anuradha, "A brief introduction on big data 5vs characteristics and hadoop technology," vol. 48, pp. 319-324, Elsevier B.V., 2015.

D. Loshin, Big Data Analytics. Elsevier, 2013.

B. A. Devlin and P. T. Murphy, "An architecture for a business and information system," IBM Systems Journal, vol. 27, pp. 60-80, 1988.

R. Kimball and M. Ross, The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons, 2011.

W. Inmon, Building the Data Warehouse. QED Technical Publishing Group, 1992.

K. Corral, D. Schuff, G. Schymik, R. S. Louis, and G. Schymik, "Enabling self-service bi through a dimensional model management warehouse enabling self-service bi enabling self-service bi through a dimensional model management warehouse," 2015.

S. Chaudhuri and U. Dayal, "An overview of data warehousing and olap technology," ACM SIGMOD Record, vol. 26, pp. 65-74, 3 1997.

A. F. Vermeulen, Data Science Technology Stack, pp. 1-13. Berkeley, CA: Apress, 2018.

N. Miloslavskaya and A. Tolstoy, "Big data, fast data and data lake concepts," vol. 88, pp. 300-305, Elsevier B.V., 2016.

F. Nargesian, E. Zhu, R. J. Miller, K. Q. Pu, and P. C. Arocena, "Data lake management: Challenges and opportunities," vol. 12, pp. 1986-1989, VLDB Endowment, 2018.

T. John and P. Misra, Data Lake for Enterprises. Packt Publishing, 5 2017.

S. Chaudhuri, U. Dayal, and V. Narasayya, "An overview of business intelligence technology," 8 2011.

H. J. Watson, "Tutorial: Business intelligence - past, present, and future," Communications of the Association for Information Systems, vol. 25, pp. 487-510, 2009.

J. B. de Vasconcelos and A´ lvaro Rocha, "Business analytics and big data," International Journal of Information Management, vol. 46, pp. 320-321, 6 2019.

T. Sakao and A. Neramballi, "A product/service system design schema: Application to big data analytics," 4 2020.

P. Mikalef, I. O. Pappas, J. Krogstie, and M. Giannakos, "Big data analytics capabilities: a systematic literature review and research agenda," Information Systems and e-Business Management, vol. 16, pp. 547-578, 8 2018.

R. Iqbal, F. Doctor, B. More, S. Mahmud, and U. Yousuf, "Big data analytics: Computational intelligence techniques and application areas," Technological Forecasting and Social Change, vol. 153, p. 119253, 2020.

A. Beheshti, B. Benatallah, R. Nouri, V. M. Chhieng, H. Xiong, and X. Zhao, "Coredb: A data lake service," vol. Part F131841, pp. 2451-2454, Association for Computing Machinery, 11 2017.

R. Hai, S. Geisler, and C. Quix, "Constance: An intelligent data lake system," vol. 26-June-2016, pp. 2097-2100, Association for Computing Machinery, 6 2016.

B. D. Wissel, P. J. V. Camp, M. Kouril, C. Weis, T. A. Glauser, P. S. White, I. S. Kohane, and J. W. Dexheimer, "An interactive online dashboard for tracking covid-19 in u.s. counties, cities, and states in real time," Journal of the American Medical Informatics Association, vol. 27, pp. 1121-1125, 7 2020.

G. Agapito, C. Zucco, and M. Cannataro, "Covid-warehouse: A data warehouse of italian covid-19, pollution, and climate data," International Journal of Environmental Research and Public Health, vol. 17, pp. 1-22, 8 2020.
PEREIRA, Francinaldo Almeida; COSTA, Júlio Gustavo S. F.; GONÇALVES, Luiz M. G.. A Data Lake and Analytics Platform with Application to COVID-19 Dynamic Analysis. In: WORKSHOP DE ANÁLISE DE PADRÕES E DADOS DA DINÂMICA DE PANDEMIAS - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 35. , 2022, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 171-176. DOI: https://doi.org/10.5753/sibgrapi.est.2022.23283.


<< < 1 2