On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems

  • Ivy Peng Lawrence Livermore National Laboratory
  • Roger Pearce Lawrence Livermore National Laboratory
  • Maya Gokhale Lawrence Livermore National Laboratory

Resumo


Large-scale high-performance computing (HPC) systems consist of massive compute and memory resources tightly coupled in nodes. We perform a large-scale study of memory utilization on four production HPC clusters. Our results show that more than 90% of jobs utilize less than 15% of the node memory capacity, and for 90% of the time, memory utilization is less than 35%. Recently, disaggregated architecture is gaining traction because it can selectively scale up a resource and improve resource utilization. Based on these observations, we explore using disaggregated memory to support memory-intensive applications, while most jobs remain intact on HPC systems with reduced node memory. We designed and developed a user-space remote-memory paging library to enable applications exploring disaggregated memory on existing HPC clusters. We quantified the impact of access patterns and network connectivity in benchmarks. Our case studies of graph-processing and Monte-Carlo applications evaluated the impact of application characteristics and local memory capacity and highlighted the potential of throughput scaling on disaggregated memory.
Palavras-chave: Memory management, Task analysis, Micromechanical devices, Resource management, Servers, Production, Libraries, Disaggregated Memory, Memory Utilization, Remote Paging, Remote Memory
Publicado
08/09/2020
PENG, Ivy; PEARCE, Roger; GOKHALE, Maya. On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 32. , 2020, Porto/Portugal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 183-190.