Probabilistic data structures applied to implicit graph representation

Juan P. A. Lopes; Fabiano S. Oliveira; Paulo E. D. Pinto

doi:10.5753/ctd.2018.3659

Juan P. A. Lopes
Fabiano S. Oliveira
Paulo E. D. Pinto

DOI: https://doi.org/10.5753/ctd.2018.3659

Resumo

In recent years, probabilistic data structures have been extensively employed to handle large volumes of streaming data in a timely fashion. However, their use in algorithms on giant graphs has been poorly explored. We introduce the concept of probabilistic implicit graph representation, which can represent large graphs using much less memory asymptotically by allowing adjacency test to have a constant probability of false positives or false negatives. This is an extension from the concept of implicit graph representation, comprehensively studied by Muller and Spinrad. Based on that, we also introduce two novel representations using probabilistic data structures. The first uses Bloom filters to represent general graphs with the same space complexity as the adjacency matrix (outperforming it however for sparse graphs). The other uses MinHash to represent trees with lower space complexity than any deterministic implicit representation. Furthermore, we prove some theoretical limitations for the latter approach.