Extraction and Representation of Sparsity Patterns for Efficient Data Transfer on Accelerators

Yang Su; Toshiyuki Ichiba; Katsuhiro Yoda; Yasuhiro Watanabe; Takahide Yoshikawa; Tarek S. Abdelrahman

Yang Su University of Toronto
Toshiyuki Ichiba Fujitsu Limited
Katsuhiro Yoda Fujitsu Limited
Yasuhiro Watanabe Fujitsu Limited
Takahide Yoshikawa Fujitsu Limited
Tarek S. Abdelrahman University of Toronto

Resumo

Sparse computations are common in practical HPC, AI and graph-based applications. Such computations often exhibit scattered and fragmented data accesses, which negatively impact data transfer efficiency to/from accelerators. We propose, implement and evaluate an algorithm for extracting or mining sparsity patterns that exist in sparse matrices. The algorithm extracts multiple pattern types in a matrix, including blocks, bands, triangles or regular compositions of each. It does so without a priori knowledge of the presence of these patterns in the matrix. The patterns may contain, under user control, zero elements, or imperfections, to facilitate the extraction of larger patterns. Additionally, we introduce the Compressed Sparse Pattern (CSP), a novel compressed representation for sparse matrices that is based on these patterns. The use of CSP combined with extensions to Address Generation Units (AGUs) of accelerators regularize data accesses and improve data transfer efficiency. Evaluation of the pattern mining algorithm and CSP using 26 real-world sparse matrices is conducted on an Ubuntu system with an 8 core Intel CPU (3.6 GHz i7-9700K) and 32 GB of memory. The evaluation shows that patterns of different sizes and shapes are common, representing ∼82% of the non-zero elements in these matrices. The patterns can be efficiently extracted in time, with an average of 4.6 seconds. The evaluation also shows that the mining of composite patterns contributes ∼8% to the number of non-zeros in patterns and that imperfections increase pattern sizes with a minimal impact of only ∼7% zero elements in patterns. Finally, using CSP leads to up to 90% reduction in data transfer overhead, compared to CSR and CSC, both common compressed sparse matrix representations. These results validate our approach of extracting and representing patterns to improve data transfer efficiency.

Palavras-chave: Shape, High performance computing, Computer architecture, Data transfer, Computational efficiency, Sparse matrices, Data mining, Artificial intelligence, sparsity patterns, sparse matrix representations, data transfer efficiency, heterogeneous systems, accelerators