A Distributed and Storage-Aware Approach to Large-Scale Cholesky Factorization
Resumo
Cholesky factorization is a core operation in scientific computing, yet its scalability is often constrained by memory limitations when processing extremely large dense matrices. This work introduces an out-of-core Cholesky factorization algorithm for symmetric positive-definite matrices that integrates GPU acceleration, block-wise lossless compression, and parallel I/O to overcome these limitations. The approach leverages the OMPC runtime for asynchronous task scheduling and employs HDF5 to store the matrix on disk, taking advantage of Lustre’s parallel I/O capabilities in distributed environments. Tiles are decompressed just-in-time on the GPU, significantly reducing host memory usage, storage footprint, and end-to-end data movement overhead—from disk through the CPU to the GPU—without compromising numerical accuracy. Experimental results show that the proposed method scales across 8 GPU nodes, successfully factorizing matrices up to 3M × 3M. In comparison, SLATE could only handle sizes up to 700K × 700K, with the proposed algorithm achieving up to 41% higher throughput. These results demonstrate the algorithm’s scalability and competitiveness beyond memory-constrained in-core solutions, offering a practical path for enabling extreme-scale scientific applications.
Palavras-chave:
Symmetric matrices, Runtime, Scientific computing, Scalability, Memory management, Graphics processing units, Throughput, Libraries, Matrix decomposition, Next generation networking
Publicado
28/10/2025
Como Citar
CUSIHUALLPA, Carla; CECCATO, Rodrigo; RIGO, Sandro; ARAUJO, Guido; YVIQUEL, Hervé.
A Distributed and Storage-Aware Approach to Large-Scale Cholesky Factorization. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 37. , 2025, Bonito/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 248-259.
