Using Balanced Data Placement to Address I/O Contention in Production Environments

  • Sarah Neuwirth University of Heidelberg
  • Feiyi Wang National Center for Computational Sciences, Oak Ridge National Laboratory
  • Sarp Oral National Center for Computational Sciences, Oak Ridge National Laboratory
  • Sudharshan Vazhkudai National Center for Computational Sciences, Oak Ridge National Laboratory
  • James Rogers National Center for Computational Sciences, Oak Ridge National Laboratory
  • Ulrich Bruening University of Heidelberg

Abstract


Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly.
Keywords: Libraries, Load management, Scalability, Resource management, Aggregates, Production, Performance evaluation, Parallel File System, High Performance Computing, Load Balancing, Performance Evaluation
Published
2016-10-26
NEUWIRTH, Sarah; WANG, Feiyi; ORAL, Sarp; VAZHKUDAI, Sudharshan; ROGERS, James; BRUENING, Ulrich. Using Balanced Data Placement to Address I/O Contention in Production Environments. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 28. , 2016, Los Angeles/EUA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 9-17.