Understanding GPU Energy and Environmental Impact – Part II
Nour Rteil | Dec. 8, 2025
In part I of this blog series, we explored the embodied and operational power trends of GPUs, observing how their power consumption has scaled over time. Specifically, we noted an approximate x1.4/year (CAGR of 41%) increase in power over the last 4, reflecting the growing computational demands and hardware complexity. However, power consumption alone doesn’t tell the full story, to truly understand the evolution of GPUs, we must also examine their performance improvements and the resulting energy efficiency gains.
In this second part, we dive deeper into performance trends, particularly focusing on floating-point operations (FP32 and FP16), and assess how these advancements translate into performance-per-watt metrics. This analysis is especially important given the rapid acceleration of AI workloads, which rely heavily on GPUs for both training and inference.
Performance Trends
GPUs traditionally rely on FP32 (single-precision floating-point) for most graphics, scientific computing, and general-purpose compute. However, the rise of deep learning has made FP16 (half-precision floating-point) increasingly important due to its faster computation and reduced memory bandwidth requirements.
For this analysis, we evaluated 116 datacenter GPUs, excluding consumer-grade GPUs used in mobiles/PCs. The figures below show the FP32 and FP16 performance across release years.
We notice an increase in performance, specifically for the period between 2021 and 2025:
- FP32 performance has been increasing at a rate of roughly x1.38 per year (CAGR = 38%).
- FP16 performance is improving even faster, with a growth factor of about x1.57 per year (CAGR = 57%).
Energy Efficiency Trends
Performance improvements are only part of the picture. To measure real progress, we need to understand how efficiently GPUs convert power into computational work, or their GFLOPs/watt.
The figures below show the FP32 and FP16 energy efficiency across release years, highlighting the most popular GPUs used for AI training.
When looking at the FP32 energy efficiency, we notice that:
- 2022 was an extreme outlier with more than double the efficiency of the previous year.
- The three years following the peak (2023, 2024, and 2025) have all experienced significant negative growth (declines) as shown in the table below.
On the other hand, FP16 energy efficiency shows more volatility than FP32. Despite the dips, the overall trend from 2021 to 2025 is a near 2x increase in value over four years, which implies significant performance/watt efficiency improvements on average.
This trend suggests a strategic trade-off where developers and manufacturers are prioritizing raw speed and throughput, often associated with AI and machine learning tasks, over maintaining strict power efficiency targets.
GFLOPs/watt metric provides a theoretical peak of raw computational energy efficiency. However, in real-world scenarios, particularly for AI, other benchmarks offer a more practical measure. Notably, standardized AI performance suites like MLPerf Training measure the end-to-end efficiency on typical AI tasks, capturing the total system energy drain from memory, interconnects, and other components, providing a more complete picture of real-world energy efficiency.
Unfortunately, the published MLPerf Training results are still too sparse to construct a robust long-term trend. However, notable patterns emerge, especially for models like Llama 2 70B Lora.
We notice improved performance (lower latency) across years, however this also comes with increased TDPs. Apart from Nvidia’s Blackwell Ultra (GB200, GB300), newer GPU generations show declining real-world efficiency.
Conclusion
To summarize:
- In the last 4 years, GPUs can perform roughly 1.38 to 1.57 times more work (in FP32 and FP16 respectively) than their predecessors annually.
- However, they consume about 1.4 times more power each year.
=> The results were mixed as this sometimes translates in a positive or negative net effect in energy efficiency: - FP32 efficiency: spike in 2022, followed by a sustained decline.
- FP16 efficiency: growth and declines but generally going in an upward trajectory.
- MLPerf Llama2 70b lora efficiency: with the exception of GB200 and GB300, we notice a decline in energy efficiency across newer GPU generations.
Meanwhile, AI workloads, particularly deep learning training and inference, are estimated to be growing at an exponential rate. This explosive demand means that even with continuous efficiency improvements, GPU power consumption and overall system energy demands will continue to rise sharply.
In essence, the current rate of GPU efficiency improvements may still struggle to keep pace with the unprecedented growth of AI computational needs. This underscores the importance of continued innovation in hardware architecture, software and model optimizations, as well as alternative approaches to traditional GPUs, such as custom silicon, specialized AI accelerators and other ASIC-based solutions.
For example, Google’s Tensor Processing Units (TPUs), are designed to handle deep learning workloads. Unlike general-purpose GPUs, their special features, such as the matrix multiply unit (MXU) and proprietary interconnect topology, make them ideal for running AI training and inference. Google has begun mass-deploying its seventh-generation Ironwood TPU (TPU v7), offering up to 4,614 TFLOPS of peak performance and dramatically greater energy efficiency compared with earlier TPU generations.
Likewise, Amazon Web Services (AWS) has unveiled its third-generation in-house AI chip, Trainium 3, which the company claims is significantly faster (x4.4 more compute performance) and more efficient (x4) than its predecessor, Trainium 2, and offers substantial savings in training cost for enterprise AI workloads.
As innovation moves forward in both hardware and software, the future of AI computing will be just as powerful and flexible as the challenges it’s designed to solve.