Register Flush-free Runahead Execution for Modern Vector Processors

  • Hikaru Takayashiki Tohoku University
  • Masayuki Sato Tohoku University
  • Kazuhiko Komatsu Tohoku University
  • Hiroaki Kobayashi Tohoku University

Resumo


Modern vector processors have been designed to achieve high sustained performance, especially in HPC applications, because of their powerful instruction set oriented to data-level parallelism. Additionally, the latest vector processor adopts the out-of-order execution of the vector instructions to exploit instruction-level parallelism due to a significant gap in latency between vector arithmetic instructions and vector load/store instructions. In spite of the effort, this gap still brings a deterioration of sustained performance of the modern vector processors. This paper proposes a runahead execution mechanism for the modern vector processors to fill the latency gap by further exploiting instruction-level parallelism. If the processor stalls due to a long latency instruction, the conventional runahead execution mechanism changes the processor state from a normal mode to a runahead mode, and the processor speculatively executes the subsequent instructions that can cause stalls and their dependencies. However, the conventional runahead execution mechanisms flush the registers' values calculated in the runahead mode after finishing this mode and cannot reuse them in the subsequent normal mode. Since the vector processors have many values even in one vector register, these flushes and re-executions waste the bandwidth between cores and caches. Thus, to solve this problem of the conventional runahead mechanism, our proposed mechanism leaves the registers containing the results in the runahead mode in order for the processor to use the registers even after returning to the normal mode. For correctly using these registers after exiting the runahead mode, the proposed mechanism newly realizes functions to inherit the commit order information and the register aliasing information of the runahead-executed instructions into the normal mode. The evaluation results show that the proposed mechanism improves the performance by up to 20% and 3% on average by the conventional mechanism.
Palavras-chave: Out of order, High performance computing, Pipelines, Computer architecture, Bandwidth, Parallel processing, Registers
Publicado
26/10/2021
Como Citar

Selecione um Formato
TAKAYASHIKI, Hikaru; SATO, Masayuki; KOMATSU, Kazuhiko; KOBAYASHI, Hiroaki. Register Flush-free Runahead Execution for Modern Vector Processors. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 33. , 2021, Belo Horizonte. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 114-125.