GPU-NB: A Fast CUDA-Based Implementation of Naïve Bayes
Resumo
The advent of the Web 2.0 has given rise to an interesting phenomenon: there is currently much more data than what can be effectively analyzed without relying on sophisticated automatic tools. Some of these tools, which target the organization and extraction of useful knowledge from this huge amount of data, rely on machine learning and data or text mining techniques, specifically automatic document classification algorithms. However, these algorithms are still a computational challenge because of the volume of data that needs to be processed. Some of the strategies available to address this challenge are based on the parallelization of ADC algorithms. In this work, we present GPU-NB, a parallel version of one of the most widely used document classification algorithms, the Naïve Bayes, that uses graphics processing units (GPUs). In our evaluation using 6 different document collections, we show that the GPU-NB can maintain the same classification effectiveness (in most cases) while increasing the efficiency by up to 34x faster than its sequential version using CPU. GPU-NB is also up to 11x faster than a CPU-based parallel implementation of Naive Bayes running with 4 threads. Moreover, assuming an optimistic behavior of the CPU parallelization, GPU-NB should outperform the CPU-based implementation with up to 32 cores, at a small fraction of the cost. We also show that the efficiency of the GPU-NB parallelization is impacted by features of the document collections, particularly the number of classes, although the density of the collection (average number of occurrences of terms per document) has a significant impact as well.
Palavras-chave:
Graphics processing units, Instruction sets, Kernel, Probability, Data mining, Training, Parallel processing
Publicado
23/10/2013
Como Citar
ANDRADE, Guilherme; VIEGAS, Felipe; RAMOS, Gabriel Spada; ALMEIDA, Jussara; ROCHA, Leonardo; GONÇALVES, Marcos; FERREIRA, Renato.
GPU-NB: A Fast CUDA-Based Implementation of Naïve Bayes. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 25. , 2013, Porto de Galinhas/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2013
.
p. 168-175.
