A Fast Similarity Search kNN for Textual Datasets

  • Leonardo Afonso Amorim UFG
  • Mateus F. Freitas UFG
  • Paulo Henrique da Silva UFG
  • Wellington S. Martins UFG

Resumo


The k nearest neighbors (kNN) is an algorithm for finding the closest k points in metric spaces. Due to its high computational costs, many parallel solutions have been proposed, including some implementations targeted at modern accelerators. However, most approaches assume relatively low dimensionality and dense data. Such conditions do not apply to textual datasets, which are known for their high dimensionality and sparsity. This work presents a fine-grained parallel algorithm that applies filtering technique based on most common important terms of the query document using an inverted index and its implementation on GPU. Our method improves the top k nearest neighbors search in textual datasets by up to 37x with a single GPU.
Palavras-chave: Graphics processing units, Indexes, Approximation algorithms, Parallel algorithms, Sorting, Proposals, knn, textual datasets, high dimensionality, sparsity, filtering technique, fine-grained parallel algorithm
Publicado
01/10/2018
AMORIM, Leonardo Afonso; FREITAS, Mateus F.; DA SILVA, Paulo Henrique; MARTINS, Wellington S.. A Fast Similarity Search kNN for Textual Datasets. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 19. , 2018, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 229-236.