Efficient Tensor Slicing for Multicore NPUs using Memory Burst Modeling

  • Rafael Sousa UNICAMP
  • Byungmin Jung LGE San Jose Labs
  • Jaehwa Kwak LGE San Jose Labs
  • Michael Frank MagiCore
  • Guido Araujo UNICAMP


Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrained Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' in-put/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing parallelism and MAC utilization are central to any effective solution. This paper proposes a TensorFlow XLA/LLVM compiler optimization pass for Multicore NPUs, called Tensor Slicing Optimization (TSO), which: (a) maximizes convolution parallelism and memory usage across NPU cores; and (b) reduces data transfers between host and NPU on-chip memories by using DRAM memory burst time estimates to guide tensor slicing. To evaluate the proposed approach, a set of experiments was performed using the NeuroMorphic Processor (NMP), a multicore NPU containing 32 RISC-V cores extended with novel CNN instructions. Experimental results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models. Speed-ups of up to 21.7% result when comparing the TSO burst-based technique to a no-burst data slicing approach.
Palavras-chave: Tensors, Multicore processing, Convolution, Computational modeling, Memory management, Parallel processing, Data models, burst-based model, convolutional neural network, NPU, mapping strategies
Como Citar

Selecione um Formato
SOUSA, Rafael; JUNG, Byungmin; KWAK, Jaehwa; FRANK, Michael; ARAUJO, Guido. Efficient Tensor Slicing for Multicore NPUs using Memory Burst Modeling. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 33. , 2021, Belo Horizonte. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 84-93.