adaptive clock and power rates also have other problems. This can make production a bit more complicated. Every GPU & CPU combination must reach their frequencies at the desired power target. And at those high frequencies (of the GPU) I really doubt that many chips can make it. Only if those peak clocks are only really short bursts.
Every unit in the SOC needs to meet its performance target under the design's max transient power limit and the TDP. The parameters of the power delivery system and cooler would give limits to the acceptability of silicon, but the lower points would tend to be less extreme in their scaling than the points at the edge of the safety margin. Cerny made a reference to clock/power points intended to match the thermal density of the GPU and CPU sections, although I'm not clear on why that was emphasized given there seems to be no evidence of any other AMD products needing that, and they can experience more significant swings than the PS5's described method can.
On the other hand, such a method could be simpler than the usual AMD production method, where the validation suites would be testing many more DVFS points and transition combinations than the PS5's design requires. Whatever the PS5 DVFS points are, the described system is consistent with using AMD's standard DVFS in a less challenging way than other consumer products.
That is where I see the problem. You can increase load of the GPU (which doesn't say about how stressed it really is) but not every calculation will max everything out. Different calculations lead to different internal load, even though the GPU can't do anything more. Even if one game creates 100% usage of the GPU at 80W (just as an example) another game might just max the GPU out at 80% but already use 150W of power. This is what makes it much more complicated for productions. More or less every CPU & GPU combination must be tested more or less with most possible load-testing and each time it must reach the same (fixed) frequencies with a fixed power target.
The validation process for the PS5 seems more complex than it was for the PS4. However, in terms of manufacturing it looks to me like it's within the limits of what AMD does routinely since there's a version of this DVFS in virtually every chip it makes.
The system itself is using a model that is conservative in terms of what it calculates as a worst-case output, but the dynamic estimate is significantly closer to reality than the prior generation's design-time guard banding. The estimates the PS5 uses should be more conservative since every chip needs to meet the platform's model SOC standards, whereas AMD's many product bins and high-clocking SKUs can tweak parameters and make assumptions about silicon quality the console cannot.
I doubt that many chips will make it through that binning process.
It's an apparently single-binned console SOC being built in the millions. For practical purposes, it is very important that most do. The CPU portion is significantly below the design max of the Zen 2 core, so I think that element is unlikely to be an obstacle. The GPU max clock is unusually high for prior GPU generations, but it seems reasonable that a pipeline specifically tailored for a higher clock target can hit a max clock that is modestly higher than the peak clocks of some RDNA products, especially since it doesn't need to be sustained.
Whether taking the GPU clocks to this level will be the winning design philosophy remains to be seen, but it seems to me that it should at least be producible.
True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:
An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.
This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.
AMDs DVFS has been described in other products as using activity monitors for functional elements of the pipeline. Later proposals and patents also included things like small blocks of redundant processing hardware that served as representative elements for the behavior of the most demanding silicon, such as dummy ALUs and registers running operations intended to give a worst-case figure for electrical and thermal performance. Then there's a significant number of thermal sensors and current monitors.
The on-die voltage management and Vdroop protection indicate the hardware can manage and detect current and voltage changes at the microsecond or nanosecond scale. The activity monitors and thermal estimates work to gauge power consumption and die temperatures at microseconds up to a millisecond range, going by the power management described for various GPUs and Zen.
I think AMD's described token-based power trading between chips or chip regions before, which may go into what SmartShift can rely upon for determining how much slack is left in the power budget.
What the PS5 appears to be doing is taking all of this DVFS hardware, backing away from the highest CPU clock ranges, and picking a more conservative and fixed set of figures for the per-chip power model.