Learned Index Based on the Prediction of Seed Vertices in Proximity Graphs

  • Matheus B. Bastos University of Londrina (UEL)
  • Daniel S. Kaster University of Londrina (UEL)

Abstract


Similarity searches on complex data rely on methods to speed up queries, such as spatial approximation on proximity graphs, which suffer from the initial vertex selection problem. Recently, learned indexes have emerged using machine learning to model the data distribution and other characteristics, showing promising results. This paper presents an efficient learned index on graph-based methods for similarity searching. Our method employs supervised learning to predict the seed vertices. The method reduced the distance computations needed to obtain high recall rates on various datasets evaluated.

Keywords: Learned Indexes. Proximity Graphs. Similarity Search

References

Antol, M., Ol’ha, J., Slanináková, T., and Dohnal, V. (2021). Learned metric index-proposition of learned indexing for unstructured data. Information Systems, 100:101774.

Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., and Rabitti, F. (2009). CoPhIR: a test collection for content-based image retrieval. arXiv preprint arXiv:0905.4627.

Boytsov, L. and Naidan, B. (2013). Engineering efficient and effective non-metric space library. In International Conference on Similarity Search and Applications, pages 280-293.

Cutler, A., Cutler, D. R., and Stevens, J. R. (2012). Random forests. In Ensemble machine learning, pages 157-175.

Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., and Zhang, H. (2011). Fast approximate nearest-neighbor search with k-nearest neighbor graph. In Twenty-Second International Joint Conference on Artificial Intelligence.

Kraska, T., Beutel, A., Chi, E. H., Dean, J., and Polyzotis, N. (2018). The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data, pages 489-504.

Navarro, G. (2002). Searching in metric spaces by spatial approximation. The VLDB Journal, 11(1):28-46.

Novak, D., Batko, M., and Zezula, P. (2011). Metric index: An efficient and scalable solution for precise and approximate similarity search. Information Systems, 36(4):721-733.

Ocsa, A., Bedregal, C., and Cuadros-Vargas, E. (2007). A new approach for similarity queries using neighborhood graphs. In SBBD, pages 131-142.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825-2830.

Shimomura, L. C. and Kaster, D. S. (2019). Hgraph: a connected-partition approach to proximity graphs for similarity search. In International Conference on Database and Expert Systems Applications, pages 106-121.
Published
2022-09-19
BASTOS, Matheus B.; KASTER, Daniel S.. Learned Index Based on the Prediction of Seed Vertices in Proximity Graphs. In: BRAZILIAN WORKSHOP ON INTEGRATION OF DATABASES AND ARTIFICIAL INTELLIGENCE - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 37. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 223-230. DOI: https://doi.org/10.5753/sbbd_estendido.2022.21869.