Variable-Size Batched Condition Number Calculation on GPUs
Resumo
We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.
Palavras-chave:
Kernel, Graphics processing units, Instruction sets, Registers, Bandwidth, Computer architecture, Linear systems
Publicado
24/09/2018
Como Citar
ANZT, Hartwig; DONGARRA, Jack; FLEGAR, Goran; GRÜTZMACHER, Thomas.
Variable-Size Batched Condition Number Calculation on GPUs. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 30. , 2018, Lyon/FR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2018
.
p. 132-139.
