Unless I'm mistaken, this is about the power usage profile across the APU (CPU and GPU) and the aggressive ramping of power at higher clocks for little gain. When you're running at fixed clocks, at the higher end of the curve you'll pulling crazy more power for minimal increase in performance and if you scale back a little bit of performance you actually save a lot more power that can by sent to CPU or GPU, whichever needs it more.
This is the Mark Cerny quote:
Whatever the fixed-clock of the CPU-side of the Sony's 2Ghz GPU profile was, going variable fixed it. This is the advantage.
Right so dynamic power equation is
P = CV^2 f a
Where f and V are directly proportional (need more voltage as frequency increases to maintain stability of charge)
V is squared
C is the capacitance and geometry of the gates, this doesn't change
f = frequency
a = activity change - some may consider this workload, more on this in a second.
So the reason we see power efficiency nose dive as frequency continues to go up is because power will increase basically as a function of cubic requirements
ie. 2x the frequency will now increase voltage 2x, but squared that, which is 4x. So you have 8x more power for 2x more frequency.
Eventually you'll hit thermal limits for the chip due to parametric yield or cooling hits a hard wall (wattage per cm^2) and you can not proceed any further.
The cooling requirements are very steep the higher power goes, the eventuality is that silicon will start passing the requirments for cooling a nuclear reactor at per cm^2. So cooling becomes a hard wall as well.
So lets do some interesting math.
Lets take PS5 at 2230Mhz -- and it's power to be Denoted as Ps5
And another PS5 at 1825 Mhz (XSX speeds) --- and we can denote its power as Psx
So 1825 Mhz is actually 80% of 2230. Or 4/5
So lets plug in f as 2230 Mhz, and V is varied proportionally.
Psx = C(4/5V)^2 (4/5)f a
Or if we remove the constants Psx = 65/125 less power than Ps5. Or 50% less power.
So by downclocking by 405Mhz, we extract a power savings of 50% on the same chip.
Inversely then going upwards it is 5/4s faster at the cost of 5/4 more voltage squared.
Cool so lets calculate what that means in power for the inverse
95% more power, nearly 100% more power to go from 1825 to 2230Mhz.
Let E = energy to execute, and T = time to execute --- Let PS5 be b and PSX to be a
Eb = 2Ea
Tb = 4/5Ta
So PS5 is 2x the energy required from PSX, but 20% faster, 4/5s
Lets try to use DVFS to rescale the power requirements of PS5 so that it has the same amount of power as PSX, and see if it's still going to run faster.
I'm not going to show the math as it will take a while. But using DVFS to try to match the power level from PS5 to PSX obtains a ratio of 0.92. Which is less than 1. Meaning using DVFS it's still slower.
So this is proof that the boost is not a neutral position and it's heavily overclocked with respect to its own power curve.
So back to the original formula.
P = C V^2 f a.
The big thing is RDNA2 improves performance per watt by up to 50%. This is getting people to napkin math 50% better clocks but it doesn't work like that.
Using the dynamic power formula, your F and V are proportionally locked such that P is proportional to f^3.
leaving C, the capacitance and geometry of the gates as variables for change. SInce we know they aren't changing the gates, that leaves activity level.
or the amount of switching from 0s -> 1s and 1-> 0s.
If AMD has found better ways to use less power in their chip for certain activities, it can score up to 50% power improvement.. so for specific tasks (1/2)a
But for other tasks, that will light up everything, it will it likely be a. So the improvement will vary based on activity between 1/2 to 1.0 proportionally.
Which brings us back to what Cerny was talking about. The idea of locking power output as part of PS5, leaves voltage and frequnecy and C locked, leaving (a).
Developers must find algorithms that will use less power, or better put, less power per core. So can we parallelize the algorithms over more cores such that each core is doing less operations (thus less power) and therefore not going over the power budget.
You see, wide has huge power implications if you can parallelize your work. This is why we have multicore processors.
Using the dynamic power function again:
A core with 4 Ghz frequency vs a core 1 Ghz frequency
- The 4Ghz frequency will complete its work 4x faster than the 1Ghz core.
- But at a rate of 64x more power, this is because of the cubic relationship between power and frequency.
- But if you make 4 cores at 1Ghz, you can complete the work in the same amount of time, but only 4/64 the amount of power vs the 1 4Ghz core. So you're looking at 1/16 of the power for savings.
This is why as we approached a Megahertz ceiling we started going multi-core. It saves tons of power.
And in the same way, if we write algorithms that go parallel instead of single threaded, we will also save tons of power, this will allow the clock rates to keep high on PS5.
TLDR; the clock speed is very high, calculating for DVFS, it is still burning more power to keep that that clockrate than the chip should provide (within the architecture). It did not fix it. It just allowed them to market a higher number.
This power curve could affect parametric yield on chips, could affect the amount of cooling required, and it will likely have to be pulling additional power from the CPU to maintain it's clockrate as per their original statement that their design could not hit 2Ghz fixed.