Abstract:
This paper introduces an architectural technique that reduces energy of Tensor Cores in GPGPUs. Over the past few years, deep neural networks (DNNs) have become the compe...View moreMetadata
Abstract:
This paper introduces an architectural technique that reduces energy of Tensor Cores in GPGPUs. Over the past few years, deep neural networks (DNNs) have become the compelling solution for many applications such as image classification, speech recognition, and natural language processing. Various hardware frameworks have been proposed to accelerate DNNs. In particular, Tensor Cores in NVIDIA GPGPUs offer significant speedup compared with previous GPGPU architectures. However, the great success comes at the cost of excessive energy. Value-based optimization techniques have been utilized to accelerate DNNs. In particular, several studies exploited sparse values to skip unnecessary computations. However, the majority of these studies focused on acceleration of DNNs rather than energy saving. In this work, we exploit power gating to reduce energy of Tensor Cores. We show that blindly applying power gating to multipliers results in significant performance loss due to timing overhead of power gating. In order to mitigate performance penalty of power gating, we propose sparsity-aware power gating (SPG) that monitors inputs of multipliers and turns them off only if inputs remain sparse for long intervals. We further improve SPG by introducing an adaptive technique that dynamically changes power gating policy based on frequency of changes in inputs of multipliers. Our experimental results show that our proposed technique can achieve 21% energy saving in Tensor Cores with negligible impact on performance while maintaining accuracy.
Published in: 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Date of Conference: 26-29 October 2021
Date Added to IEEE Xplore: 28 December 2021
ISBN Information: