Data Management in the Continuum: Cross-facility Object-based Data Transfers
Resumo
Scientific workflows are evolving from relying on a monolithic storage subsystem at a single High-Performance Computing (HPC) facility to using geographically distributed file systems, repositories, and cloud storage. As a result, storing, accessing, transferring, and managing scientific data have become highly complex and prone to performance inefficiencies. This paper delves into these challenges by exploring an optimized end-to-end interface designed to seamlessly connect various local and remote storage systems, enabling efficient data movement of objects across HPC–Cloud and HPC–HPC environments. We showcase this capability through an object-focused data management runtime system, discuss the effects of relaxed consistency semantics in distributed object scenarios, and illustrate its application in an earthquake simulation workflow. Besides reducing the amount of data by selectively transferring regions of interest, our facility-local results achieved a speedup of 45 × over an optimized HDF5 usage and 15 × over the HDF5 with caching by using the new interface in PDC-XF.
Palavras-chave:
Performance evaluation, Cloud computing, Runtime, High performance computing, Scalability, Semantics, Earthquakes, Memory management, Data transfer, Kernel, object data transfers, cross-facility, continuum
Publicado
28/10/2025
Como Citar
BEZ, Jean Luca; TANG, Houjun; WANG, Chen; BYNA, Suren.
Data Management in the Continuum: Cross-facility Object-based Data Transfers. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 37. , 2025, Bonito/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 46-57.
