Energy-Aware Deep Learning on GPUs through Parameter Sharing and Mixed Precision Training

Roblex Nana Tchakoute; Claude Tadonki

Roblex Nana Tchakoute Mines Paris - PSL University
Claude Tadonki Mines Paris - PSL University

Resumo

The design of Deep Learning models increasingly relies on advanced techniques such as parameter sharing and mixed precision training to handle computational and memory costs. Although effective in theory, their practical impact on system-level performance, energy consumption, and memory subsystem behavior is complex and interdependent. This paper presents a performance/energy trade-off analysis of the aforementioned techniques through an empirical case study. We benchmark a diverse suite of six models, including a direct comparison of DistilBERT (conventional) and ALBERT (parameter-sharing) on a multi-GPU NVIDIA A100 platform. Our analysis across FP32, TF32, and mixed BF16 precisions reveals two key findings. First, contrary to its smaller parameter count, ALBERT is empirically up to 2.2× slower and triggers up to ~ 3× higher GPU memory footprint than DistilBERT; an effect that we attribute to its deep conceptual unrolling that led to many memory activations and thereby incurs an important overhead. Second, while mixed BF16 precision provides an average training speed-up of ~ 2.1×, the benefits are strongly model-dependent. Using an empirical Throughput-per-Watt (Samples/Joule) efficiency metric, we show that compute-bound models like TinyLlama are more energy efficient, whereas CNNs show marginal improvements, which we link to implicit TF32 acceleration in their FP32 baseline via the cuDNN library.

Palavras-chave: Training, Deep learning, Measurement, Runtime, Power demand, Computational modeling, Memory management, Graphics processing units, Benchmark testing, Energy efficiency, runtime analysis, power consumption, energy efficiency, benchmarking, mixed precision, deep learning training, GPU modeling