Epiflow: a hybrid approach to track infectious disease spread in Brazil based on travel data and graph databases

  • Mariama C. S. de Oliveira Universidade Federal de Pernambuco (UFPE)
  • Andrêza Leite de Alencar Univerisade Federal Rural de Pernambuco (UFRPE) https://orcid.org/0000-0002-7083-0646
  • Natalia Tatiele S. de Oliveira Universidade Federal de Pernambuco (UFPE)
  • Lucas Henrique Gonzaga de Sales Universidade Federal Rural de Pernambuco (UFRPE)
  • Antônio Ricardo Khouri Cunha Fundação Oswaldo Cruz (Fiocruz)
  • Pablo Ivan Pereira Ramos Fundação Oswaldo Cruz (Fiocruz)


Based on open data on cities and transport, the present study proposes an approach that uses travel probabilities and graph-oriented database to identify possible disease propagation routes within the Brazilian territory. Route identification was implemented by adapting the Dijkstra algorithm in the Data Science module of Neo4j. A tool called Epiflow was also developed to allow visual exploration of the proposed approach. Validated by COVID-19 data, the approach successfully predicted routes for large geographical areas of risk, such as states. These findings suggest that transport data and graph databases can be used to create applications that assist decision-making in tracking disease spread in the early stages.

Palavras-chave: Graph database, Track infectious disease, Travel Data, Dijkstra, COVID-19, Brazil


Bajardi, P., Poletto, C., Ramasco, J. J., Tizzoni, M., Colizza, V., and Vespignani, A. (2011). Human mobility networks, travel restrictions, and the global spread of 2009 H1N1 pandemic. PloS one, 6(1):e16591.

Balcan, D., Gonçalves, B., Hu, H., Ramasco, J. J., Colizza, V., and Vespignani, A. (2010). Modeling the spatial spread of infectious diseases: The global epidemic and mobility computational model. Journal of computational science, 1(3):132–145.

Berrar, D. (2019). Cross-validation.

Brockmann, D. and Helbing, D. (2013). The hidden geometry of complex, network-driven contagion phenomena. Science, 342(6164):1337–1342.

Dash (2022). Dash python user guide. "url= [link]". Retrieved November 14, 2022.

Dijkstra, E. W. (2022). A note on two problems in connexion with graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy, pages 287–290.

EpiRisk (2022). Epirisk. "url= [link]". Retrieved November 14, 2022.

Faria, N. R., Mellan, T. A., Whittaker, C., Claro, I. M., Candido, D. d. S., Mishra, S., Crispim, M. A., Sales, F. C., Hawryluk, I., McCrone, J. T., et al. (2021). Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science, 372(6544):815–821.

Gilbert, M., Pullano, G., Pinotti, F., Valdano, E., Poletto, C., Boëlle, P.-Y., d’Ortenzio, E., Yazdanpanah, Y., Eholie, S. P., Altmann, M., et al. (2020). Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study. The Lancet, 395(10227):871–877.

Keeling, M. J. and Rohani, P. (2008). Modeling Infectious Diseases in Human and Animals. Princeton University Press.

Ministério da Infraestrutura (2020). Anuário estatístico de transportes 2010 - 2020. "url= [link]". Retrieved November 14, 2022.

Mu, X., Yeh, A. G.-O., and Zhang, X. (2021). The interplay of spatial spread of covid-19 and human mobility in the urban system of China during the Chinese new year. Environment and Planning B: Urban Analytics and City Science, 48(7):1955–1971.

Nakamura, H. and Managi, S. (2020). Airport risk of importation and exportation of the COVID-19 pandemic. Transport policy, 96:40–47.

Neo4j (2022). Neo4j. "url= [link]". Retrieved November 14, 2022.

Neo4j GDS (2022). The neo4j graph data science library manual v2.2. "url= [link]". Retrieved November 14, 2022.

Peixoto, P. S., Marcondes, D., Peixoto, C., and Oliva, S. M. (2020). Modeling future spread of infections via mobile geolocation data and population dynamics. an application to COVID-19 in Brazil. PloS one, 15(7):e0235732.

PostgreSQL (2022). Postgresql. "url= [link]". Retrieved November 14, 2022.

Reback, J., McKinney, W., Van Den Bossche, J., Augspurger, T., Cloud, P., Klein, A., Hawkins, S., Roeschke, M., Tratner, J., She, C., et al. (2020). Pandas-dev/pandas: Pandas 1.0. 5. Zenodo.

Sadekar, O., Budamagunta, M., Sreejith, G., Jain, S., and Santhanam, M. (2021). An infectious diseases hazard map for India based on mobility and transportation networks. arXiv preprint arXiv:2105.15123.

Statista (2022). Number of flights performed by the global airline industry from 2004 to 2022. "url= [link]". Retrieved November 14, 2022.

Wei, Y., Wang, J., Song, W., Xiu, C., Ma, L., and Pei, T. (2021). Spread of COVID-19 in China: analysis from a city-based epidemic and mobility model. Cities, 110:103010.

WHO (2022a). Who coronavirus (COVID-19) dashboard. "url= [link]". Retrieved November 14, 2022.

WHO (2022b). Who releases 10-year strategy for genomic surveillance of pathogens. "url= [link]". Retrieved November 14, 2022.
OLIVEIRA, Mariama C. S. de; ALENCAR, Andrêza Leite de; OLIVEIRA, Natalia Tatiele S. de; SALES, Lucas Henrique Gonzaga de; CUNHA, Antônio Ricardo Khouri; RAMOS, Pablo Ivan Pereira. Epiflow: a hybrid approach to track infectious disease spread in Brazil based on travel data and graph databases. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 218-230. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2023.231736.