Sampling-based Sparse Format Selection on GPUs

Gangyi Zhu; Gagan Agrawal

Gangyi Zhu Ohio State University
Gagan Agrawal Augusta University

Resumo

Sparse Matrix-Vector Multiplication (SpMV) is an important kernel in numerous computational disciplines. The overall performance of SpMV is highly dependent on the storage format of the sparse matrix. This has led to much interest in recent years on automatically choosing the appropriate format, typically using machine learning techniques and training a model using a large number of matrices. However, these methods have limitations in practice - besides the dependency on obtaining a large number of sparse matrices of training and expensive overheads of the training, they usually have limited prediction ability across architectures. In this paper, we take a very distinct approach to the same problem. This approach involves obtaining samples from the original matrix, executing the kernel using these samples, and selecting the best format. However, our approach requires obtaining representative samples that can help understand performance associated with using a specific format on the full matrix, which turns out to be challenging. Based on the storage properties and processing granularity associated with different formats, we develop three novel sampling schemes: Row Cropping sampling, Random Warp sampling, and Diagonal Aligning sampling. These sampling methods are designed by observing that certain factors tend to be critical for performance associated with a particular format, and thus preserving that factor through sampling. Experimental results using nearly 2000 matrices demonstrate that our approach delivers high efficiency without the expensive training process, and it is easy to migrate across architectures. At the same time, our approach achieves comparable prediction accuracy with the state-of-art methodologies, and even outperforms them in certain cases (especially for predicting on some of the largest matrices we use). Through our work, we also offer new insights into the performance achieved using different formats on GPUs.

Palavras-chave: Training, Bridges, Analytical models, High performance computing, Computational modeling, Computer architecture, Machine learning, Sparse Computations, GPUs Sampling