High-Quality I/O Bandwidth Prediction with Minimal Data via Transfer Learning Workflow
Resumo
Providing a high-quality performance prediction has the potential to enhance various aspects of a cluster, such as devising scheduling and provisioning policies, guiding procurement decisions, suggesting candidate applications for tuning, and identifying probable scaling and porting challenges. Creating such a prediction for the I/O metrics is still challenging, however, due to the intricate interplay of multiple cluster components, making this an ideal case for machine learning. Nevertheless, achieving the required accuracy level with machine learning calls for a substantial amount of high-quality data, which is often a difficult challenge for most HPC clusters. In this work we explore the use of transfer learning to predict the applications’ I/O bandwidth based on a public dataset. As a result, our experiment can provide an I/O bandwidth prediction for a different cluster comparable to the current state-of-the-art result while employing 100 times less data than needed to construct the base model. Furthermore, we evaluate potential future improvements of the proposed workflow.
Palavras-chave:
Procurement, Measurement, High performance computing, Computational modeling, Transfer learning, Bandwidth, Computer architecture, Predictive models, Data models, Tuning, I/O, Transfer Learning, Interpretable Machine Learning, Explainable AI
Publicado
13/11/2024
Como Citar
POVALIAIEV, Dmytro; LIEM, Radita; KUNKEL, Julian; LOFSTEAD, Jay; CARNS, Philip.
High-Quality I/O Bandwidth Prediction with Minimal Data via Transfer Learning Workflow. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 36. , 2024, Hilo/Hawaii.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 93-104.