Performance Modeling and Estimation of a Configurable Output Stationary Neural Network Accelerator

Ali Oudrhiri; Emilien Taly; Nathan Bain; Alix Munier; Roberto Guizzetti; Pascal Urard

Ali Oudrhiri STMicroelectronics / Sorbonne Université / CNRS / LIP6
Emilien Taly STMicroelectronics / Univ. Grenoble Alpes / CNRS / Grenoble INP / TIMA
Nathan Bain STMicroelectronics / Univ. Grenoble Alpes / CNRS / Grenoble INP / TIMA
Alix Munier Sorbonne Université / CNRS / LIP6
Roberto Guizzetti STMicroelectronics
Pascal Urard STMicroelectronics

Resumo

Neural network accelerators are designed to process Neural Networks (NN) optimizing three Key Performance Indicators (KPIs): latency, power, and chip area. This work is based on the study of Gemini, an industrial prototype near memory computing inference accelerator designed using a high-level synthesis technique. Gemini is an output stationary configurable accelerator that achieves its performance based on two structural parameters. The measurement of the KPIs requires simulations that are time-consuming and resource-intensive. This paper presents a high-level practical estimator that can instantly predict the KPIs depending on the NN and the Gemini configuration. The latency is accurately derived using an analytical model based on the architecture, the operators scheduling and the NN characteristics. The power and the chip area are computed analytically and the models are calibrated using simulations. Finally, we show how to use the estimator to derive Pareto optima for choosing the best Gemini configurations for a VGG-like NN.

Palavras-chave: Neural network accelerator, output stationary, estimation, latency, power, area