Probabilistic data structures applied to implicit graph representation

  • Juan P. A. Lopes UERJ
  • Fabiano S. Oliveira UERJ
  • Paulo E. D. Pinto UERJ

Abstract


In recent years, probabilistic data structures have been extensively employed to handle large volumes of streaming data in a timely fashion. However, their use in algorithms on giant graphs has been poorly explored. We introduce the concept of probabilistic implicit graph representation, which can represent large graphs using much less memory asymptotically by allowing adjacency test to have a constant probability of false positives or false negatives. This is an extension from the concept of implicit graph representation, comprehensively studied by Muller and Spinrad. Based on that, we also introduce two novel representations using probabilistic data structures. The first uses Bloom filters to represent general graphs with the same space complexity as the adjacency matrix (outperforming it however for sparse graphs). The other uses MinHash to represent trees with lower space complexity than any deterministic implicit representation. Furthermore, we prove some theoretical limitations for the latter approach.

References

Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426.

Broder, A. Z. (1997). On the resemblance and containment of documents. In Compression and Complexity of Sequences 1997. Proceedings, pages 21–29. IEEE.

Flajolet, P., Fusy, É., Gandouet, O., and Meunier, F. (2008). Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. DMTCS Proceedings.

Kannan, S., Naor, M., and Rudich, S. (1992). Implicit representation of graphs. SIAM Journal on Discrete Mathematics, 5(4):596–603.

Li, P. and König, A. C. (2010). b-bit minwise hashing. In Nineteenth International World Wide Web Conference (WWW 2010). Association for Computing Machinery, Inc.

Lopes, J. P. A. (2017). Estruturas de dados probabilísticas aplicadas à representação implícita. Master’s thesis, Universidade do Estado do Rio de Janeiro.

Lopes, J. P. A., Oliveira, F. S., and Pinto, P. E. D. (2016a). Estimativa de cardinalidade da interseção de conjuntos utilizando as estruturas minhash e hyperloglog. In Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, volume 5. SBMAC.

Lopes, J. P. A., Oliveira, F. S., and Pinto, P. E. D. (2016b). Estruturas de dados probabilísticas para representação de conjuntos. In Anais do XXXVI Congresso da Sociedade Brasileira de Computação. Sociedade Brasileira de Computação.

Lopes, J. P. A., Oliveira, F. S., and Pinto, P. E. D. (2017). Representações implícitas probabilísticas de grafos. In Anais do XXXVII Congresso da Sociedade Brasileira de Computação. Sociedade Brasileira de Computação.

Muller, J. H. (1988). Local structure in graph classes. PhD thesis, Georgia Institute of Technology.

Spinrad, J. P. (2003). Efficient graph representations. American mathematical society.
Published
2018-07-26
LOPES, Juan P. A.; OLIVEIRA, Fabiano S.; PINTO, Paulo E. D.. Probabilistic data structures applied to implicit graph representation. In: THESIS AND DISSERTATION CONTEST (CTD), 31. , 2018, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 25-30. ISSN 2763-8820. DOI: https://doi.org/10.5753/ctd.2018.3659.