Variable-Size Batched Condition Number Calculation on GPUs

Hartwig Anzt; Jack Dongarra; Goran Flegar; Thomas Grützmacher

Hartwig Anzt University of Tennessee
Jack Dongarra University of Tennessee / University of Manchester
Goran Flegar Universidad Jaume I
Thomas Grützmacher Karlsruhe Institute of Technology

Resumo

We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.

Palavras-chave: Kernel, Graphics processing units, Instruction sets, Registers, Bandwidth, Computer architecture, Linear systems