Managing Sparse Spatio-Temporal Data in SAVIME: an Evaluation of the Ph-tree Index
Resumo
Scientific data is mainly multidimensional in its nature, presenting interesting opportunities for optimizations when managed by array databases. However, in scenarios where data is sparse, an efficient implementation is still required. In this paper, we investigate the adoption of the Ph-tree as an in-memory indexing structure for sparse data. We compare the performance in data ingestion and in both range and punctual queries, using SAVIME as the multidimensional array DBMS. Our experiments, using a real weather dataset, highlights the challenges involving providing a fast data ingestion, as proposed by SAVIME, and at the same time efficiently answering multidimensional queries on sparse data.
Referências
Brown, P. G. (2010). Overview of scidb: Large scale array storage, processing and analysis. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, page 963–968, New York, NY, USA. Association for Computing Machinery.
Lustosa, H., Porto, F., Blanco, P., and Valduriez, P. (2016). Database system support of simulation data. Proceedings of the VLDB Endowment (PVLDB), 9(13):1329–1340.
Lustosa, H. L. S., Silva, A. C., da Silva, D. N. R., Porto, F. A. M., and Valduriez, P. (2020). Savime: An array dbms for simulation analysis and ml models prediction. Journal of Information and Data Management, 11(3).
Papadopoulos, S., Datta, K., Madden, S., and Mattson, T. (2016). The tiledb array data storage manager. Proceedings of the VLDB Endowment, 10(4):349–360.
Stonebraker, M., Brown, P., Poliakov, A., and Raman, S. (2011). The architecture of scidb. In International Conference on Scientific and Statistical Database Management, pages –16. Springer.
The HDF Group (1997-2021). Hierarchical Data Format, version 5. http://www.hdfgroup.org/HDF5/.
Vancea, B. A. (2015). Cluster-computing and parallelization for the multi-dimensional ph-index. Master’s thesis, ETH Zurich.
Zalipynis, R. A. R. (2018). Chronosdb: Distributed, file based, geospatial array dbms. Proc. VLDB Endow., 11(10):1247–1261.
Zäschke, T., Zimmerli, C., and Norrie, M. C. (2014). The ph-tree: a space-efficient storage structure and multi-dimensional index. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 397–408.