An experimental evaluation of pivot selection strategies

Abstract


Similarity searching supports distance comparison-based tasks, e.g., content-based retrieval and classification. Under this paradigm, pivot-based indexes may speed up the search by reducing the number of distance calculations. Those indexes store the distances from data objects to a set of chosen elements (i.e., the pivots) so that the pre-computed values are combined with the triangle inequality to prune the search space during a query. Therefore, the "quality" of the pivots conditions the performance of such structures. In this study, we experimentally investigate eight distinct selection strategies (kMEDOIDS, C-HULL,PCA, M-VARIANCE, SELECTION, S-S-SELECTION, GNAT, and M-SEPARATED) coupled with indexes Omni kd-Tree and VP-Tree. The findings suggest methods M-VARIANCE e kMEDOIDS can find good pivots, whereas strategies GNAT and C-HULL may deliver low-quality pivots, on average. Results also suggest different indexes may be fine tuned by distinct pivot selection methods

Keywords: Metric Indexes, Similarity Searching, Pivot Selection

References

Chávez, E., Navarro, G., Baeza-Yates, R., and Marroquín, J. L. (2001). Searching in metric spaces. ACM Computing Surveys, 33(3):273-321.

Chen, L., Gao, Y., Zheng, B., Jensen, C., Yang, H., and Yang, K. (2017). Pivot-based metric indexing. Proceedings of the VLDB Endowment, 10(10):1058-1069.

Hetland, M. (2009). The basic principles of metric indexing. In Swarm Intelligence for Multi-objective Problems in Mining, pages 199-232. Springer.

Hjaltason, G. and Samet, H. (2003). Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, 28(4):517-580.

Mao, R., Zhang, P., Li, X., Liu, X., and Lu, M. (2016). Pivot selection for metric-space indexing. International Journal of Machine Learning and Cybernetics, 7(2):311-323.

Traina Jr, C., Filho, R., Traina, A., Vieira, M., and Faloutsos, C. (2007). The omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483-505.

Yianilos, P. (1993). Data structures and algorithms for nearest neighbor. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, volume 66, page 311. SIAM.

Zhu, Y., Chen, L., Gao, Y., and Jensen, C. (2022). Pivot selection algorithms in metric spaces: A survey and experimental study. The VLDB Journal, 31(1):23-47.
Published
2022-09-19
LEITE, João V. S.; TELLES, Wagner R.; OLIVEIRA, Rodolfo A.; DE OLIVEIRA, Daniel; BEDO, Marcos. An experimental evaluation of pivot selection strategies. In: BRAZILIAN WORKSHOP ON INTEGRATION OF DATABASES AND ARTIFICIAL INTELLIGENCE - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 37. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 215-222. DOI: https://doi.org/10.5753/sbbd_estendido.2022.21868.