Empirical Evaluation of Strategies to Process Range Queries of Numeric Sequences in Batch-mode

Luiz F. A. Brito; Marcelo K. Albertini

doi:10.5753/sbbd.2017.174232

Luiz F. A. Brito Universidade Federal de Uberlândia
Marcelo K. Albertini Universidade Federal de Uberlândia

DOI: https://doi.org/10.5753/sbbd.2017.174232

Resumo

Tree structures are largely used to index and search sequences on secondary memory. In some situations, many range queries are processed almost simultaneously and the resulting number of disk accesses can be high. In order to reduce the number of disk accesses, similar sequences can be grouped and spanned as a single query. A simple strategy is to unify all sequences into a single group. However, other strategies for grouping sequences can also be used. In this paper, we present and empirical evaluation of 5 common grouping strategies for R-trees and M-trees. Our results indicate that for inputs modelled as a random walk distribution the overall best implemented strategy for grouping queries is indeed the one unifying all queries in a single group.

Palavras-chave: Tree Structures, Index, Search, Group Sequences

Referências

Camoglu, O., Kahveci, T., and Singh, A. K. (2003). Towards index-based similarity search for protein structure databases. In Computational Systems Bioinformatics, pages 148–158. IEEE.

Ciaccia, P., Patella, M., and Zezula, P. (1997). M-tree: An efficient access method for similarity search in metric spaces. In VLDB ’97, pages 426–435. M. K. Publishers Inc.

Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. In SIGMOD Rec., volume 23, pages 419–429. ACM.

Fu, A. W.-C., Keogh, E., Lau, L. Y., Ratanamahatana, C. A., and Wong, R. C.-W. (2008). Scaling and time warping in time series querying. In The VLDB Journal, volume 17, pages 899–921. Springer-Verlag New York, Inc.

Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In SIGMOD Rec., volume 14, pages 47–57. ACM.

Kotsifakos, A., Karlsson, I., Papapetrou, P., Athitsos, V., and Gunopulos, D. (2015). Embedding-based subsequence matching with gaps-range-tolerances: A query-by-humming application. In The VLDB Journal, volume 24, pages 519–536. Springer-Verlag New York, Inc.

Moon, Y.-S., Whang, K.-Y., and Loh, W.-K. (2001). Duality-based subsequence matching in time-series databases. In Proc. 17th Int. Conf. Data Engineering, pages 263–272. IEEE.

Orchard, M. T. (1991). A fast nearest-neighbor search algorithm. In ICASSP 91: 1991 Int. Conf. Acoustics, Speech, and Signal Processing, volume 4, pages 2297–2300. IEEE.

Park, H.-S. and Jun, C.-H. (2009). A simple and fast algorithm for k-medoids clustering. In Expert Systems with Applications, volume 36, pages 3336–3341. Elsevier.