Constellation Queries over Big Data

  • Fabio Porto National Laboratory for Scientific Computing
  • Amir Khatibi Federal University of Minas Gerais
  • João N. Rittmeyer National Laboratory for Scientific Computing
  • Eduardo Ogasawara Federal Center for Technological Education of Rio de Janeiro
  • Patrick Valduriez INRIA
  • Dennis Shasha New York University

Abstract


A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. Finding geometric patterns is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the pattern. In this paper, we propose algorithms to find patterns in large data applications. Our methods combine quadtrees, matrix multiplication, and bucket join processing to discover sets of points that match a geometric pattern within some additive factor on the pairwise distances. Our distributed experiments show that the choice of composition algorithm (matrix multiplication or nested loops) depends on the freedom introduced in the query geometry through the distance additive factor. Three clearly identified blocks of threshold values guide the choice of the best composition algorithm.

Keywords: Big data, constellation queries, large data patterns, quadtrees, matrix multiplication, bucket join processing

References

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer, page 7.

Brucato, M., Beltran, J. F., Abouzied, A., and Meliou, A. (2016). Scalable package queries in relational database systems. Proc. VLDB Endow., 9(7):576–587. 00004.

Einstein, A. (2015). Relativity: The special and the general theory.

Giugno, R. and Shasha, D. (2002). GraphGrep: A fast and universal method for querying graphs. In Proceedings - International Conference on Pattern Recognition, volume 16, pages 112–115. 2 edition. 00125.

Jolion, J. (2001). Graph matching : what are we really talking about. Proceedings of the 3rd IAPR Workshop on Graph-Based Representations in Pattern Recognition.

Overbye, D. (2015). Astronomers observe supernova and find they’re watching reruns. New York Times, USA.

R. Bank, C. D. (1993). Sparse matrix multiplication package (smmp). Advances in Computational Mathematics, 1:127–137.

Samet, H. (1990). The Design and Analysis of Spatial Data Structures. Addison-Wesley.

Singla, P. and Domingos, P. (2005). Object identification with attribute mediated dependences. Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases.

U. Zwick, R. Y. (2005). Fast sparse matrix multiplication. ACM Transactions on Algorithms (TALG), 1:2–13.

Zou, L., Chen, L., and Ozsu, M. T. (2009). Distance-join: Pattern Match Query in a Large Graph Database. Proc. VLDB Endow., 2(1):886–897.

Zou, L., Chen, L., Ozsu, M. T., and Zhao, D. (2011). Answering pattern match queries in large graph databases via graph embedding. The VLDB Journal, 21:97-120
Published
2018-08-25
PORTO, Fabio; KHATIBI, Amir; RITTMEYER, João N.; OGASAWARA, Eduardo; VALDURIEZ, Patrick; SHASHA, Dennis. Constellation Queries over Big Data. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 33. , 2018, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 85-96. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2018.22221.