Performance Modeling and Estimation of a Configurable Output Stationary Neural Network Accelerator
Resumo
Neural network accelerators are designed to process Neural Networks (NN) optimizing three Key Performance Indicators (KPIs): latency, power, and chip area. This work is based on the study of Gemini, an industrial prototype near memory computing inference accelerator designed using a high-level synthesis technique. Gemini is an output stationary configurable accelerator that achieves its performance based on two structural parameters. The measurement of the KPIs requires simulations that are time-consuming and resource-intensive. This paper presents a high-level practical estimator that can instantly predict the KPIs depending on the NN and the Gemini configuration. The latency is accurately derived using an analytical model based on the architecture, the operators scheduling and the NN characteristics. The power and the chip area are computed analytically and the models are calibrated using simulations. Finally, we show how to use the estimator to derive Pareto optima for choosing the best Gemini configurations for a VGG-like NN.
Palavras-chave:
Neural network accelerator, output stationary, estimation, latency, power, area
Publicado
17/10/2023
Como Citar
OUDRHIRI, Ali; TALY, Emilien; BAIN, Nathan; MUNIER, Alix; GUIZZETTI, Roberto; URARD, Pascal.
Performance Modeling and Estimation of a Configurable Output Stationary Neural Network Accelerator. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 35. , 2023, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 89-97.