An experimental evaluation of pivot selection strategies
Abstract
Similarity searching supports distance comparison-based tasks, e.g., content-based retrieval and classification. Under this paradigm, pivot-based indexes may speed up the search by reducing the number of distance calculations. Those indexes store the distances from data objects to a set of chosen elements (i.e., the pivots) so that the pre-computed values are combined with the triangle inequality to prune the search space during a query. Therefore, the "quality" of the pivots conditions the performance of such structures. In this study, we experimentally investigate eight distinct selection strategies (kMEDOIDS, C-HULL,PCA, M-VARIANCE, SELECTION, S-S-SELECTION, GNAT, and M-SEPARATED) coupled with indexes Omni kd-Tree and VP-Tree. The findings suggest methods M-VARIANCE e kMEDOIDS can find good pivots, whereas strategies GNAT and C-HULL may deliver low-quality pivots, on average. Results also suggest different indexes may be fine tuned by distinct pivot selection methods
References
Chen, L., Gao, Y., Zheng, B., Jensen, C., Yang, H., and Yang, K. (2017). Pivot-based metric indexing. Proceedings of the VLDB Endowment, 10(10):1058-1069.
Hetland, M. (2009). The basic principles of metric indexing. In Swarm Intelligence for Multi-objective Problems in Mining, pages 199-232. Springer.
Hjaltason, G. and Samet, H. (2003). Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, 28(4):517-580.
Mao, R., Zhang, P., Li, X., Liu, X., and Lu, M. (2016). Pivot selection for metric-space indexing. International Journal of Machine Learning and Cybernetics, 7(2):311-323.
Traina Jr, C., Filho, R., Traina, A., Vieira, M., and Faloutsos, C. (2007). The omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483-505.
Yianilos, P. (1993). Data structures and algorithms for nearest neighbor. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, volume 66, page 311. SIAM.
Zhu, Y., Chen, L., Gao, Y., and Jensen, C. (2022). Pivot selection algorithms in metric spaces: A survey and experimental study. The VLDB Journal, 31(1):23-47.
