Impact of GPU Architecture and VRAM on Image Generation: A Study of Energy Efficiency in Heterogeneous Edge Nodes
Resumo
The rapid evolution of generative artificial intelligence has substantially increased the computational demands of image synthesis models, traditionally restricting their execution to centralized cloud infrastructures. In response to concerns related to data privacy, energy consumption, cost, and dependency on hyperscale providers, this work investigates the feasibility of executing state-of-the-art generative image models at the network. We present a quantitative performance evaluation of two representative model families, Stable Diffusion XL (SDXL) and Z-Image Turbo, executed on heterogeneous hardware, including high-end consumer GPUs, mobile-class devices, and legacy workstation GPUs from NVIDIA and AMD. The analysis focuses on latency, power consumption, resource utilization, and the impact of software stack optimizations, such as attention mechanisms and backend frameworks, under realistic hardware constraints. Results show that software-level optimizations are the primary factors determining inference viability, often outweighing raw computational throughput. While modern GPUs benefit from optimized attention mechanisms and improved energy efficiency, legacy and lower-power devices remain viable when combined with optimized runtimes and model compression techniques. These findings demonstrate that contemporary generative workloads can be effectively supported by decentralized edge infrastructures, providing practical insights for the design of energy-efficient and heterogeneous local AI systems.
Referências
Ansari, M. Q. and Ansari, M. Q. (2025). Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures. arXiv preprint arXiv:2507.20063.
Black Forest Labs (2025). FLUX.1-Kontext [Dev]: Model Weights and Autoencoder. Hugging Face. Available: [link]. Accessed: Dec. 31, 2025.
Black Forest Labs (2026). FLUX.2-klein-9B. Hugging Face. Available: [link]. Accessed: May 15, 2026.
Dao, T. (2023). FlashAttention-2. arXiv preprint arXiv:2307.08691.
Felix, I. T. et al. (2026). Impact of GPU Architecture and VRAM on Quantized LLM Inference for Code Deobfuscation. In Proceedings of the Computer, Data Sciences and Applications (ACDSA 2026), Boracay, Philippines.
Haoming02 (2025). Stable Diffusion WebUI Forge Neo. GitHub Repository. Available: [link]. Accessed: Nov. 20, 2025.
Jiang, D. et al. (2025). Distribution Matching Distillation Meets Reinforcement Learning. arXiv preprint arXiv:2511.13649.
jiangchengchengNLP (2025). Qwen3-4B-FP8-Scaled (safetensors model weights). Hugging Face. Available: [link]. Accessed: Sep. 15, 2025.
Katal, A., Dahiya, S., and Choudhury, T. (2023). Energy efficiency in cloud computing data centers: a survey on software technologies. Cluster Computing, 26:1845–1875.
Lefaudeux, B. et al. (2022). xFormers - Toolbox to Accelerate Research on Transformers. GitHub Repository. Available: [link]. Accessed: Nov. 20, 2024.
(lllyasviel), L. Z. (2024). Stable Diffusion WebUI Forge. GitHub Repository. Available: [link]. Accessed: Nov. 20, 2025.
Park, S. H. et al. (2024). Illustrious: an Open Advanced Illustration Model. arXiv preprint arXiv:2409.19946.
Paszke, A. et al. (2019). PyTorch. In NeurIPS.
Paul, S. G., Saha, A., Arefin, M. S., Bhuiyan, T., Biswas, A. A., and Reza, A. W. (2023). A Comprehensive Review of Green Computing: Past, Present, and Future Research. IEEE Access, 11:87445–87494.
Podell, D. et al. (2023). SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv preprint arXiv:2307.01952.
PyTorch Team (2025). PyTorch Nightly Builds (Container Images). GitHub. Available: [link]. Accessed: Dec. 31, 2025.
ROCm Project (2025). rocBLAS Package for Arch Linux (x86 64). Arch Linux Repository. Available: [link]. Accessed: Dec. 31, 2025.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695.
SANTOS, I. T. F. d. (2026). Impact of GPU Architecture and VRAM on Image Generations. GitHub repository. Available: [link]. Accessed: May 15, 2026.
Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27(3):379–423.
Stability AI (2025). SDXL VAE Weights (safetensors). Hugging Face. Available: [link]. Accessed: Jul. 16, 2025.
Tongyi-MAI Team (2025). Z-Image: Scalable High-Fidelity Image Generation with Distilled Turbo Schedulers. GitHub Repository. Available: [link].
Z-Image Team et al. (2025). Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer. arXiv preprint arXiv:2511.22699.
Zhang, J. et al. (2024). SageAttention: Accurate 8-bit Attention. arXiv preprint arXiv:2410.02367.
