Sparsity-aware Power Gating for Tensor Cores

Ehsan Atoofian

Ehsan Atoofian Lakehead University

Resumo

This paper introduces an architectural technique that reduces energy of Tensor Cores in GPGPUs. Over the past few years, deep neural networks (DNNs) have become the compelling solution for many applications such as image classification, speech recognition, and natural language processing. Various hardware frameworks have been proposed to accelerate DNNs. In particular, Tensor Cores in NVIDIA GPGPUs offer significant speedup compared with previous GPGPU architectures. However, the great success comes at the cost of excessive energy. Value-based optimization techniques have been utilized to accelerate DNNs. In particular, several studies exploited sparse values to skip unnecessary computations. However, the majority of these studies focused on acceleration of DNNs rather than energy saving. In this work, we exploit power gating to reduce energy of Tensor Cores. We show that blindly applying power gating to multipliers results in significant performance loss due to timing overhead of power gating. In order to mitigate performance penalty of power gating, we propose sparsity-aware power gating (SPG) that monitors inputs of multipliers and turns them off only if inputs remain sparse for long intervals. We further improve SPG by introducing an adaptive technique that dynamically changes power gating policy based on frequency of changes in inputs of multipliers. Our experimental results show that our proposed technique can achieve 21% energy saving in Tensor Cores with negligible impact on performance while maintaining accuracy.

Palavras-chave: Tensors, Computer architecture, Speech recognition, Logic gates, Natural language processing, Timing, Sparse matrices, Deep neural networks, accelerator architecture, Tensor Core, energy