Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

  • Sajith Kalathingal Inria
  • Caroline Collange Inria
  • Bharath N. Swamy Inria
  • André Seznec Inria

Abstract


Threads of Single-Program Multiple-Data (SPMD) applications often execute the same instructions on different data. We propose the Dynamic Inter-Thread Vectorization Architecture (DITVA) to leverage this implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode. In this mode, multiple scalar threads running in lockstep share a single instruction stream and their respective instruction instances are aggregated into SIMD instructions. To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures. Our evaluation on the SPMD applications from the PARSEC and Rodinia OpenMP benchmarks shows that a 4-warp × 4-lane 4-issue DITVA architecture with a realistic bank-interleaved cache achieves 1.55× higher performance than a 4-thread 4-issue SMT architecture with AVX instructions while fetching and issuing 51% fewer instructions, achieving an overall 24% energy reduction.
Keywords: Convergence, Pipelines, Instruction sets, Computer architecture, Parallel processing, Throughput, Hardware, Simultaneous MultiThreading, Single Instruction Multiple Data, Single Program Multiple Data, Vectorization
Published
2016-10-26
KALATHINGAL, Sajith; COLLANGE, Caroline; SWAMY, Bharath N.; SEZNEC, André. Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 28. , 2016, Los Angeles/EUA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 18-25.