A Performance and Energy Study of GPU-Resident Preconditioners for Conjugate Gradient Solvers: In the Context of Existing and Novel Approaches
Resumo
Optimizing a particular subprogram out of the set of Basic (sparse) Linear Algebra Subprograms (BLAS) for a given architecture is a common topic of research. In applications, however, these BLAS functions rarely appear in isolation; usually, many of them are used together, in various combinations and with varying inputs. As the need to solve a large, sparse linear system is ubiquitous throughout HPC applications, linear solvers constitute a realistic, sufficiently complex and well-defined representative use case for composite BLAS routines. To this end, based on a representative set of matrices drawn from a diverse set of fields, we present a framework to study, from the performance and energy perspective, the efficacy of GPU-resident parallel Conjugate Gradient (CG) linear solver with different preconditioner options, including Gauss-Seidel, Jacobi, and incomplete Cholesky. We also propose a novel GPU-based preconditioner, in which the triangular solves are approximated by an iterative process. The development of this preconditioner was motivated by solving large graph Laplacian linear systems, for which the existing preconditioners either perform slow on GPU-based platforms or are not applicable. We compare the performance of these preconditioners on different hardware accelerator architectures, i.e., AMD MI250X, MI100, Nvidia A100, V100, and Jetson. Our experiments reveal performance trade-offs and provide information on how to select the best strategy for the given linear system, dictated by its properties, and the platform of interest. We demonstrate the application of our novel preconditioner for solving CG and graph Laplacian systems. Overall, the framework can be utilized as a benchmark to guide informed decisions in choosing a specific preconditioner, i.e., whether it is better to rely on the performance of a triangular solver or on the performance of sparse matrix-vector product. Finally, by considering power consumption to solve the linear systems, we report the energy footprint for the solvers.
Palavras-chave:
Linear systems, Jacobian matrices, Performance evaluation, Laplace equations, Power demand, Graphics processing units, Sparse matrices, Iterative methods, Standards, Convergence, preconditioner, iterative solver, gpu
Publicado
13/11/2024
Como Citar
ŚWIRYDOWICZ, Kasia; FIROZ, Jesun; MANZANO, Joseph; HALAPPANAVAR, Mahantesh; THOMAS, Stephen; BARKER, Kevin.
A Performance and Energy Study of GPU-Resident Preconditioners for Conjugate Gradient Solvers: In the Context of Existing and Novel Approaches. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 36. , 2024, Hilo/Hawaii.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 70-80.