Managing Sparse Spatio-Temporal Data in SAVIME: an Evaluation of the Ph-tree Index

  • Stiw Herrera Laboratório Nacional de Computação Científica (LNCC)
  • Larissa Miguez da Silva Laboratório Nacional de Computação Científica (LNCC)
  • Paulo Ricardo Reis Laboratório Nacional de Computação Científica (LNCC)
  • Anderson Silva Laboratório Nacional de Computação Científica (LNCC) / DEXL Lab
  • Fabio Porto Laboratório Nacional de Computação Científica (LNCC) / DEXL Lab

Resumo


Scientific data is mainly multidimensional in its nature, presenting interesting opportunities for optimizations when managed by array databases. However, in scenarios where data is sparse, an efficient implementation is still required. In this paper, we investigate the adoption of the Ph-tree as an in-memory indexing structure for sparse data. We compare the performance in data ingestion and in both range and punctual queries, using SAVIME as the multidimensional array DBMS. Our experiments, using a real weather dataset, highlights the challenges involving providing a fast data ingestion, as proposed by SAVIME, and at the same time efficiently answering multidimensional queries on sparse data.

Palavras-chave: Array Databases, Indexing, Ph-tree

Referências

Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., and Widmann, N. (1998). The multidimensional database system rasdaman. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pages 575–577.

Brown, P. G. (2010). Overview of scidb: Large scale array storage, processing and analysis. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, page 963–968, New York, NY, USA. Association for Computing Machinery.

Lustosa, H., Porto, F., Blanco, P., and Valduriez, P. (2016). Database system support of simulation data. Proceedings of the VLDB Endowment (PVLDB), 9(13):1329–1340.

Lustosa, H. L. S., Silva, A. C., da Silva, D. N. R., Porto, F. A. M., and Valduriez, P. (2020). Savime: An array dbms for simulation analysis and ml models prediction. Journal of Information and Data Management, 11(3).

Papadopoulos, S., Datta, K., Madden, S., and Mattson, T. (2016). The tiledb array data storage manager. Proceedings of the VLDB Endowment, 10(4):349–360.

Stonebraker, M., Brown, P., Poliakov, A., and Raman, S. (2011). The architecture of scidb. In International Conference on Scientific and Statistical Database Management, pages –16. Springer.

The HDF Group (1997-2021). Hierarchical Data Format, version 5. http://www.hdfgroup.org/HDF5/.

Vancea, B. A. (2015). Cluster-computing and parallelization for the multi-dimensional ph-index. Master’s thesis, ETH Zurich.

Zalipynis, R. A. R. (2018). Chronosdb: Distributed, file based, geospatial array dbms. Proc. VLDB Endow., 11(10):1247–1261.

Zäschke, T., Zimmerli, C., and Norrie, M. C. (2014). The ph-tree: a space-efficient storage structure and multi-dimensional index. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 397–408.
Publicado
04/10/2021
HERRERA, Stiw; SILVA, Larissa Miguez da; REIS, Paulo Ricardo; SILVA, Anderson; PORTO, Fabio. Managing Sparse Spatio-Temporal Data in SAVIME: an Evaluation of the Ph-tree Index. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 337-342. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2021.17895.