T-SNE-CUDA: GPU-Accelerated T-SNE and its Applications to Modern Data

  • David M. Chan University of California Berkeley
  • Roshan Rao University of California Berkeley
  • Forrest Huang University of California Berkeley
  • John F. Canny University of California Berkeley

Resumo


Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples. Existing visualization methods which employ dimensionality reduction to two or three dimensions are often inefficient and/or ineffective for these datasets. This paper introduces T-SNE-CUDA, a GPU-accelerated implementation of t-distributed Symmetric Neighbour Embedding (t-SNE) for visualizing datasets and models. T-SNE-CUDA significantly outperforms current implementations with 50-700x speedups on the CIFAR-10 and MNIST datasets. These speedups enable, for the first time, visualization of the neural network activations on the entire ImageNet dataset - a feat that was previously computationally intractable. We also demonstrate visualization performance in the NLP domain by visualizing the GloVe embedding vectors. From these visualizations, we can draw interesting conclusions about using the L2 metric in these embedding spaces. T-SNE-CUDA is publicly available at https://github.com/CannyLab/tsne-cuda.

Palavras-chave: Data visualization, Sparse matrices, Data models, Approximation algorithms, Computational modeling, Vegetation, Force, Artificial intelligence, Machine learning, Projection algorithms, Dimensionality Reduction, t-SNE, CUDA
Publicado
24/09/2018
CHAN, David M.; RAO, Roshan; HUANG, Forrest; CANNY, John F.. T-SNE-CUDA: GPU-Accelerated T-SNE and its Applications to Modern Data. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 30. , 2018, Lyon/FR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 330-338.