Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC

  • Nicolas Bohm Agostini Northeastern University
  • Shi Dong Northeastern University
  • Elmira Karimi Northeastern University
  • Marti Torrents Lapuerta Barcelona Supercomputing Center
  • José Cano University of Glasgow
  • José L. Abellán Universidad Catolica San Antonio de Murcia
  • David Kaeli Northeastern University

Resumo


Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry and academia to explore custom accelerators to optimize ML executions for performance and power. However, identifying which accelerator is best equipped for performing a particular ML task is challenging, especially given the growing range of ML tasks, the number of target environments, and the limited number of integrated modeling tools. To tackle this issue, it is of paramount importance to provide the computer architecture research community with a common framework capable of performing a comprehensive, uniform, and fair comparison across different accelerator designs targeting a particular ML task. To this aim, we propose a new framework named TFLITE-SOC (System On Chip) that integrates a lightweight system modeling library (SystemC) for fast design space exploration of custom ML accelerators into the build/execution environment of Tensorflow Lite (TFLite), a highly popular ML framework for ML inference. Using this approach, we are able to model and evaluate new accelerators developed in SystemC by leveraging the language's hierarchical design capabilities, resulting in faster design prototyping. Furthermore, any accelerator designed using TFLITE-SOC can be benchmarked for inference with any DNN model compatible with TFLite, which enables end-to-end DNN processing and detailed (i.e., per DNN layer) performance analysis. In addition to providing rapid prototyping, integrated benchmarking, and a range of platform configurations, TFLITE-SOC offers comprehensive performance analysis of accelerator occupancy and execution time breakdown as well as a rich set of modules that can be used by new accelerators to implement scaling up studies and optimized memory transfer protocols. We present our framework and demonstrate its utility by considering the design space of a TPU-like systolic array and describing possible directions for optimization. Using a compression technique, we implement an optimization targeting reducing the memory traffic between DRAM and on-device buffers. Compared to the baseline accelerator, our optimized design shows up to 1.26x speedup on accelerated operations and up to 1.19x speedup on end-to-end DNN execution.
Palavras-chave: Throughput, Benchmark testing, Runtime, Kernel, Optimization, Graphics processing units, Electric breakdown, DNN accelerator framework, Systolic array, Memory compression, Hardware-software co-design
Publicado
08/09/2020
AGOSTINI, Nicolas Bohm; DONG, Shi; KARIMI, Elmira; LAPUERTA, Marti Torrents; CANO, José; ABELLÁN, José L.; KAELI, David. Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 32. , 2020, Porto/Portugal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 10-19.