Metrics for Packing Efficiency and Fairness of HPC Cluster Batch Job Scheduling

  • Alexander V. Goponenko University of Central Florida
  • Kenneth Lamar University of Central Florida
  • Christina Peterson University of Central Florida
  • Benjamin A. Allan Sandia National Laboratories
  • Jim M. Brandt Sandia National Laboratories
  • Damian Dechev University of Central Florida

Abstract

Development of job scheduling algorithms, which directly influence High-Performance Computing (HPC) clusters performance, is hindered because popular scheduling quality metrics, such as Bounded Slowdown, poorly correlate with global scheduling objectives that include job packing efficiency and fairness. This report proposes Area Weighted Response Time, a metric that offers an unbiased representation of job packing efficiency, and presents a class of new metrics, Priority Weighted Specific Response Time, that assess both packing efficiency and fairness of schedules. The provided examples of simulation of scheduling of real workload traces and analysis of the resulting schedules with the help of these metrics and conventional metrics, demonstrate that although Bounded Slowdown can be readily improved by modifying the standard First Come First Served backfilling algorithm and by using existing techniques of estimating job runtime, these improvements are accompanied by significant degradation of job packing efficiency and fairness. In contrast, improving job packing efficiency and fairness over the standard backfilling algorithm, which is designed to target those objectives, is difficult. It requires further algorithm development and more accurate runtime estimation techniques that reduce frequency of underpredictions.
Published
2022-11-02
How to Cite
GOPONENKO, Alexander V. et al. Metrics for Packing Efficiency and Fairness of HPC Cluster Batch Job Scheduling. Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), [S.l.], p. 241-252, nov. 2022. ISSN 0000-0000. Available at: <https://sol.sbc.org.br/index.php/sbac-pad/article/view/28251>. Date accessed: 17 may 2024.