Proximity Graphs for Similarity Searches: Experimental Survey and the New Connected-Partition Approach HGraph
Resumo
Similarity searching is a widely used approach to retrieve complex data (images, videos, time series, etc.). Similarity searches aim at retrieving similar data according to the intrinsic characteristics of the data. Recently, graph-based methods have emerged as a very efficient alternative for similarity retrieval, with reports indicating they have outperformed methods of other categories in several situations. This work presents two main contributions to graph-based methods for similarity searches. The first contribution is a survey on the main graph types currently employed for similarity searches and an experimental evaluation of the most representative graphs in a common platform for exact and approximate search algorithms. The second contribution is a new graph-based method called HGraph, which is a connected-partition approach to build a proximity graph and answer similarity searches. Both of our contributions and results were published and received awards in international conferences.
Referências
Boytsov, L. and Naidan, B. (2013). Engineering efficient and effective non-metric space library. In Similarity Search and Applications, pages 280–293. Springer Berlin Heidelberg.
Malkov, Y., Ponomarenko, A., Logvinov, A., and Krylov, V. (2014). Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst., 45:61–68.
Naidan, B., Boytsov, L., and Nyberg, E. (2015). Permutation search methods are efficient, yet faster search is possible. Proc. VLDB Endow., 8(12):1618–1629.
Navarro, G. (2002). Searching in metric spaces by spatial approximation. The VLDB Journal The Int’l Journal on Very Large Data Bases, 11(1):28–46.
Ocsa, A., Bedregal, C., and Cuadros-Vargas, E. (2007). A new approach for similarity queries using neighborhood graphs. In Brazilian Symp. on Databases, pages 131–142.
Oyamada, R. S., Shimomura, L. C., Junior, S. B., and Kaster, D. S. (2020). Towards proximity graph auto-configuration: An approach based on meta-learning. In Advances in Databases and Information Systems, pages 93–107. Springer International Publishing.
Paredes, R. and Chávez, E. (2005). Using the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces, pages 127–138. Springer Berlin Heidelberg.
Shimomura, L. C. (2019). Proximity graphs for similarity searches: Experimental survey and the new connected-partition approach HGraph. Master’s thesis, Universidade Estadual de Londrina, Londrina-PR, Brazil.
Shimomura, L. C. and Kaster, D. S. (2019). Hgraph: A connected-partition approach to proximity graphs for similarity search. In Database and Expert Systems Applications, pages 106–121. Springer International Publishing.
Shimomura, L. C., Oyamada, R. S., Vieira, M. R., and Kaster, D. S. (2021). A survey on graph-based methods for similarity searches in metric spaces. Information Systems, 95:101507.
Shimomura, L. C., Vieira, M. R., and Kaster, D. S. (2018). Performance analysis of graph-based methods for exact and approximate similarity search in metric spaces. In Similarity Search and Applications, pages 18–32. Springer International Publishing.
Skopal, T. and Bustos, B. (2011). On nonmetric similarity search problems in complex domains. ACM Comput. Surv., 43(4):1–50.
Traina, Jr., C., Filho, R. F., Traina, A. J., Vieira, M. R., and Faloutsos, C. (2007). The omni-family of all-purpose access methods: A simple and effective way to make similarity search more efficient. The VLDB Journal, 16(4):483–505.
Uhlmann, J. K. (1991). Satisfying general proximity / similarity queries with metric trees. Information Processing Letters, 40(4):175-179.