Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
adaptive clock and power rates also have other problems. This can make production a bit more complicated. Every GPU & CPU combination must reach their frequencies at the desired power target. And at those high frequencies (of the GPU) I really doubt that many chips can make it.
Apparently they can given Sony are aiming to ship 10m consoles by early 2021. So evidently not a problem. :nope:

Problem with power-distribution. The harder developers optimize for the GPU and use it more and more intensive, the higher the GPU power draw is. This on the other hand will reduce power of the CPU. I really doubt that the GPU will be optimal used and at the same time reach the high frequencies.
And yet, we've heard zero noise, rumour or suggestion about developing for PS5, only gushing happiness at the new architecture. Many developers have been used to developing for variable clock hardware and variable performance profiles across literally hundreds/thousands of varying performance profiles because this has been the PC CPU and GPU for years.

For sake of discussion we should call it activity level and not load. Load tends to imply weight or work. Increasing load implies a lot of compute is being done. There are ways to light up a lot of activity on the GPU without actually really doing work. Copying stuff around tends to be a pretty bad offender and does absolutely no computation.
Mark Cerny said workload, activity is different. The whole GPU may may be active but the workload may be light because of lots of use of 32-bit, 64-bit or 128-bit instructions and data. Equally parts of the GPU may be inactive but the workload may be heavy because of a lot of use 256-bit instructions and data - which was a scenario Mark Cerny mentioned. The workload determines the power draw, not the level of GPU's level of activity. It's a subtle, if arguably near-semantic, difference. :yes:
 
Dynamic presets to be determined by the developer would have only served to underline how much of a hassle this paradigm actually is. Much easier to code against fixed budgets for the GPU and the CPU.
I don't feel like agreeing with you and anyone that thinks this, just because Microsoft has already done it with the cpu presets more thread / more frequency, and anyone is fine with it. to the point that nobody talks about it and slept out of mind.
Maybe next-next generation will allow *choosing* between balanced / send all dilithium energy to the gpu.
The only reason that we are still talking about Sony's implementation is that still it's not clear how it will auto-manage the frequency balancing.
 
Apparently they can given Sony are aiming to ship 10m consoles by early 2021. So evidently not a problem. :nope:


And yet, we've heard zero noise, rumour or suggestion about developing for PS5, only gushing happiness at the new architecture. Many developers have been used to developing for variable clock hardware and variable performance profiles across literally hundreds/thousands of varying performance profiles because this has been the PC CPU and GPU for years.


Mark Cerny said workload, activity is different. The whole GPU may may be active but the workload may be light because of lots of use of 32-bit, 64-bit or 128-bit instructions and data. Equally parts of the GPU may be inactive but the workload may be heavy because of a lot of use 256-bit instructions and data - which was a scenario Mark Cerny mentioned. The workload determines the power draw, not the level of GPU's level of activity. It's a subtle, if arguably near-semantic, difference. :yes:
activity level for a specific chip is how many transitions are flipping states. So 1S > 0s and vice versa. When bits flip there is much more power draw. Usually this should be associated with workload, but not always. Larger instruction sets flip way more bits with less instructions which is why you’re seeing so much more power draw. You go from adding 2x32 to adding 8x32 in a single shot across all cores. You’re going to get massive activity across more transistors across more cores. That’s an easy way to see how activity level scales on a CPU.
 
activity level for a specific chip is how many transitions are flipping states. So 1S > 0s and vice versa. When bits flip there is much more power draw. Usually this should be associated with workload, but not always.
Before I write a page an a half here, let me ask you a question. Do you believe that all FinFET transistors across the APU die are equal in terms of use and power draw?
 
Before I write a page an a half here, let me ask you a question. Do you believe that all FinFET transistors across the APU die are equal in terms of use and power draw?
certainly not in terms of use, transistors in chip are not being used equally all over. Some will definitely be used all the time as per its use for most functions that people require of the chip, and others less (functions less called, operations less called). As for power draw no, generally speaking they should operate within a close tolerance of each other, but you're going to get some differences when spread over billions of transistors.

I'm not referring to idle power states when referring to activity.
 
As for power draw no, generally speaking they should operate within a close tolerance of each other, but you're going to get some differences when spread over billions of transistors.
This isn't the case. Ignoring FinFET memory transistors, there are different types of FinFET logic gates which can be optimised for performance (and use considerably more power) and others can for lower leakage (and use considerably less power). This is why state flips are not useful for determining power draw.
 
This isn't the case. Ignoring FinFET memory transistors, there are different types of FinFET logic gates which can be optimised for performance (and use considerably more power) and others can for lower leakage (and use considerably less power). This is why state flips are not useful for determining power draw.
But from a simplistic point of view it's what we need for dynamic power equations. It would be fairly challenging to broadly talk about chips, without benchmarking.
 
But from a simplistic point of view it's what we need for dynamic power equations. It would be fairly challenging to broadly talk about chips, without benchmarking.
True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:

An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.​

This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.

I guess you do need a smart system if you are going to steal power from CPU or GPU, you have to understand the consequences otherwise it could be problematic.
 
True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:

An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.​

This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.
fair enough, I see your POV. Likely monitoring it's instructions that are coming in, which I guess in turn they would know the power levels for each type of instruction.
 
adaptive clock and power rates also have other problems. This can make production a bit more complicated. Every GPU & CPU combination must reach their frequencies at the desired power target. And at those high frequencies (of the GPU) I really doubt that many chips can make it. Only if those peak clocks are only really short bursts.
Every unit in the SOC needs to meet its performance target under the design's max transient power limit and the TDP. The parameters of the power delivery system and cooler would give limits to the acceptability of silicon, but the lower points would tend to be less extreme in their scaling than the points at the edge of the safety margin. Cerny made a reference to clock/power points intended to match the thermal density of the GPU and CPU sections, although I'm not clear on why that was emphasized given there seems to be no evidence of any other AMD products needing that, and they can experience more significant swings than the PS5's described method can.
On the other hand, such a method could be simpler than the usual AMD production method, where the validation suites would be testing many more DVFS points and transition combinations than the PS5's design requires. Whatever the PS5 DVFS points are, the described system is consistent with using AMD's standard DVFS in a less challenging way than other consumer products.


That is where I see the problem. You can increase load of the GPU (which doesn't say about how stressed it really is) but not every calculation will max everything out. Different calculations lead to different internal load, even though the GPU can't do anything more. Even if one game creates 100% usage of the GPU at 80W (just as an example) another game might just max the GPU out at 80% but already use 150W of power. This is what makes it much more complicated for productions. More or less every CPU & GPU combination must be tested more or less with most possible load-testing and each time it must reach the same (fixed) frequencies with a fixed power target.
The validation process for the PS5 seems more complex than it was for the PS4. However, in terms of manufacturing it looks to me like it's within the limits of what AMD does routinely since there's a version of this DVFS in virtually every chip it makes.
The system itself is using a model that is conservative in terms of what it calculates as a worst-case output, but the dynamic estimate is significantly closer to reality than the prior generation's design-time guard banding. The estimates the PS5 uses should be more conservative since every chip needs to meet the platform's model SOC standards, whereas AMD's many product bins and high-clocking SKUs can tweak parameters and make assumptions about silicon quality the console cannot.

I doubt that many chips will make it through that binning process.
It's an apparently single-binned console SOC being built in the millions. For practical purposes, it is very important that most do. The CPU portion is significantly below the design max of the Zen 2 core, so I think that element is unlikely to be an obstacle. The GPU max clock is unusually high for prior GPU generations, but it seems reasonable that a pipeline specifically tailored for a higher clock target can hit a max clock that is modestly higher than the peak clocks of some RDNA products, especially since it doesn't need to be sustained.
Whether taking the GPU clocks to this level will be the winning design philosophy remains to be seen, but it seems to me that it should at least be producible.

True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:

An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.​

This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.
AMDs DVFS has been described in other products as using activity monitors for functional elements of the pipeline. Later proposals and patents also included things like small blocks of redundant processing hardware that served as representative elements for the behavior of the most demanding silicon, such as dummy ALUs and registers running operations intended to give a worst-case figure for electrical and thermal performance. Then there's a significant number of thermal sensors and current monitors.
The on-die voltage management and Vdroop protection indicate the hardware can manage and detect current and voltage changes at the microsecond or nanosecond scale. The activity monitors and thermal estimates work to gauge power consumption and die temperatures at microseconds up to a millisecond range, going by the power management described for various GPUs and Zen.
I think AMD's described token-based power trading between chips or chip regions before, which may go into what SmartShift can rely upon for determining how much slack is left in the power budget.

What the PS5 appears to be doing is taking all of this DVFS hardware, backing away from the highest CPU clock ranges, and picking a more conservative and fixed set of figures for the per-chip power model.
 
I don't feel like agreeing with you and anyone that thinks this, just because Microsoft has already done it with the cpu presets more thread / more frequency, and anyone is fine with it. to the point that nobody talks about it and slept out of mind.
Maybe next-next generation will allow *choosing* between balanced / send all dilithium energy to the gpu.
The only reason that we are still talking about Sony's implementation is that still it's not clear how it will auto-manage the frequency balancing.
The 8 thread vs 16 thread preset is chosen on a per title basis and anyway still does not imply a common power envelope with the GPU. So, not for the first time, I've got no inkling of what you are talking about.
 
The only figures that Cerny threw out was stating that reducing power consumption by 10% took a couple of percent reduction in clockspeed. At 3.5Ghz (CPU), that is 70Mhz and at 2.3Ghz (GPU) that is 46Mhz. Is 10% the cap? :???:
What really confuses me is what he said that they were not able to keep the GPU stable at 2.0GHz using traditional method. So if clock speeds are only reduced 50MHz the system is still at 2.25GHz and well over the point that they couldn’t keep stable.


Is anyone able to explain a reason to this?

https://www.ign.com/articles/deathloop-devs-on-ps5-features-time-loop-gameplay

DeathLoop 4k 60 fps, they use some raytracing on PS5 and dualsense features. Not bad
I didn’t see a mention of resolution there, where does 4k 60 come from?
 
Last edited by a moderator:
I didn’t see a mention of resolution there, where does 4k 60 come from?

https://bethesda.net/en/article/7u3fdVVW7wfC5fhNyoeU2n/deathloop-gameplay-reveal-and-next-gen-details

From a blog post of bethesda just after the PS5 reveal event

DEATHLOOP is a uniquely Arkane take on the first-person shooter genre, and it is being developed for a new generation of hardware. DEATHLOOP will launch on console exclusively for PlayStation 5 this holiday season and will run at 4K/60FPS at launch. DEATHLOOP will also be launching on PC at the same time.
 
What really confuses me is what he said that they were not able to keep the GPU stable at 2.0GHz using traditional method. So if clock speeds are only reduced 50MHz the system is still at 2.25GHz and well over the point that they couldn’t keep stable.


Is anyone able to explain a reason to this?
Cerny said that they weren't able obtain 2.0 GHz with fixed clocks, not that the clocks weren't stable.
The challenge with fixed clocks is that the frequency is never allowed to go down, so as activity level continues to increase the power draw must increase to match it as well. Which means that it must be able to survive worst case scenarios with respect to activity levels. You may also encounter some yield issues as you set a very high fixed clock because all your chips must be able to withstand the torture of running high power with high frequencies. Your cooling and power system must be matched for it.

So with respect to looking at that entire system, they were unable to achieve 2.0 GHz.

Variable clocks allows them to step around those issues, if the activity level spikes the power draw high, the system can temporarily drop the frequency and the chip will still be able to continue. You no longer need to worry as much about the absolute worst case torture test because the system can continually down clock and keep within the parameters of cooling and power.

The setup they chose could reduce the yield because of fixed power draw and the requirement that all chips must be able to hit the 2230Mhz mark and hold it as per their workload rules. But as others have suggested Sony shouldn't have chosen a clockspeed that they could not have produced in fairly decent quantities.
 
Cerny said that they weren't able obtain 2.0 GHz with fixed clocks, not that the clocks weren't stable.
The challenge with fixed clocks is that the frequency is never allowed to go down, so as activity level continues to increase the power draw must increase to match it as well. Which means that it must be able to survive worst case scenarios with respect to activity levels. You may also encounter some yield issues as you set a very high fixed clock because all your chips must be able to withstand the torture of running high power with high frequencies. Your cooling and power system must be matched for it.

So with respect to looking at that entire system, they were unable to achieve 2.0 GHz.

Variable clocks allows them to step around those issues, if the activity level spikes the power draw high, the system can temporarily drop the frequency and the chip will still be able to continue. You no longer need to worry as much about the absolute worst case torture test because the system can continually down clock and keep within the parameters of cooling and power.

The setup they chose could reduce the yield because of fixed power draw and the requirement that all chips must be able to hit the 2230Mhz mark and hold it as per their workload rules. But as others have suggested Sony shouldn't have chosen a clockspeed that they could not have produced in fairly decent quantities.
Ok so with Fixed clock power is always at 2.0GHz and it doesn’t work, but with their power shift it can go from 2.23GHz to (single digit percentage drop max of 9%) 2.03GHz? In this case it still means that the GPU is operating the entire time at frequency higher than 2.0GHz. Unless the GPU is allowed to swing much more and scale with workloads?


Also while they don’t want ambient temperatures to affect the performance of the chip, it still must have some type of thermal protection in case it can’t get enough airflow or something. It would probably just shut down with an error and not just 100% ignore temperatures of the chip and possibly damage it.
 
Status
Not open for further replies.
Back
Top