Metrics for Packing Efficiency and Fairness of HPC Cluster Batch Job Scheduling
Resumo
Development of job scheduling algorithms, which directly influence High-Performance Computing (HPC) clusters performance, is hindered because popular scheduling quality metrics, such as Bounded Slowdown, poorly correlate with global scheduling objectives that include job packing efficiency and fairness. This report proposes Area Weighted Response Time, a metric that offers an unbiased representation of job packing efficiency, and presents a class of new metrics, Priority Weighted Specific Response Time, that assess both packing efficiency and fairness of schedules. The provided examples of simulation of scheduling of real workload traces and analysis of the resulting schedules with the help of these metrics and conventional metrics, demonstrate that although Bounded Slowdown can be readily improved by modifying the standard First Come First Served backfilling algorithm and by using existing techniques of estimating job runtime, these improvements are accompanied by significant degradation of job packing efficiency and fairness. In contrast, improving job packing efficiency and fairness over the standard backfilling algorithm, which is designed to target those objectives, is difficult. It requires further algorithm development and more accurate runtime estimation techniques that reduce frequency of underpredictions.
Palavras-chave:
high performance computing, parallel job scheduling, performance metrics, schedule quality, runtime estimates, packing efficiency, fairness, weighted flow time, weighted response time
Publicado
02/11/2022
Como Citar
GOPONENKO, Alexander V.; LAMAR, Kenneth; PETERSON, Christina; ALLAN, Benjamin A.; BRANDT, Jim M.; DECHEV, Damian.
Metrics for Packing Efficiency and Fairness of HPC Cluster Batch Job Scheduling. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 34. , 2022, Bordeaux/France.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 241-252.