A Fast Similarity Search kNN for Textual Datasets

  • Leonardo Afonso Amorim UFG
  • Mateus F. Freitas UFG
  • Paulo Henrique da Silva UFG
  • Wellington S. Martins UFG

Abstract

The k nearest neighbors (kNN) is an algorithm for finding the closest k points in metric spaces. Due to its high computational costs, many parallel solutions have been proposed, including some implementations targeted at modern accelerators. However, most approaches assume relatively low dimensionality and dense data. Such conditions do not apply to textual datasets, which are known for their high dimensionality and sparsity. This work presents a fine-grained parallel algorithm that applies filtering technique based on most common important terms of the query document using an inverted index and its implementation on GPU. Our method improves the top k nearest neighbors search in textual datasets by up to 37x with a single GPU.
Published
2018-10-01
How to Cite
AMORIM, Leonardo Afonso et al. A Fast Similarity Search kNN for Textual Datasets. Proceedings of the Symposium on High Performance Computing Systems (SSCAD), [S.l.], p. 229-236, oct. 2018. ISSN 0000-0000. Available at: <https://sol.sbc.org.br/index.php/sscad/article/view/15666>. Date accessed: 17 may 2024.