Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Deleted member 11852 · Jul 18, 2020

Allandor said:
adaptive clock and power rates also have other problems. This can make production a bit more complicated. Every GPU & CPU combination must reach their frequencies at the desired power target. And at those high frequencies (of the GPU) I really doubt that many chips can make it.

Apparently they can given Sony are aiming to ship 10m consoles by early 2021. So evidently not a problem. :nope:

Allandor said:
Problem with power-distribution. The harder developers optimize for the GPU and use it more and more intensive, the higher the GPU power draw is. This on the other hand will reduce power of the CPU. I really doubt that the GPU will be optimal used and at the same time reach the high frequencies.

And yet, we've heard zero noise, rumour or suggestion about developing for PS5, only gushing happiness at the new architecture. Many developers have been used to developing for variable clock hardware and variable performance profiles across literally hundreds/thousands of varying performance profiles because this has been the PC CPU and GPU for years.

iroboto said:
For sake of discussion we should call it activity level and not load. Load tends to imply weight or work. Increasing load implies a lot of compute is being done. There are ways to light up a lot of activity on the GPU without actually really doing work. Copying stuff around tends to be a pretty bad offender and does absolutely no computation.

Mark Cerny said workload, activity is different. The whole GPU may may be active but the workload may be light because of lots of use of 32-bit, 64-bit or 128-bit instructions and data. Equally parts of the GPU may be inactive but the workload may be heavy because of a lot of use 256-bit instructions and data - which was a scenario Mark Cerny mentioned. The workload determines the power draw, not the level of GPU's level of activity. It's a subtle, if arguably near-semantic, difference. :yes:

PSman1700 · Jul 18, 2020

Tkumpathenurpahl said:
First paragraph:

Astute!

Yea there's where i got it

Another game with ray tracing at 60fps, think RT becomes the norm later on.

chris1515 · Jul 18, 2020

Tkumpathenurpahl said:
First paragraph:

Astute!

And it will arrive on Xbox console later.

cheapchips · Jul 18, 2020

PSman1700 said:
Yea there's where i got it Another game with ray tracing at 60fps, think RT becomes the norm later on.

"some use of ray tracing" suggests a somewhat limited use, which is what we're expecting right?

chris1515 · Jul 18, 2020

cheapchips said:
"some use of ray tracing" suggests a somewhat limited use, which is what we're expecting right?

Maybe specular reflection like GT7...

PSman1700 · Jul 18, 2020

cheapchips said:
"some use of ray tracing" suggests a somewhat limited use, which is what we're expecting right?

Yes like we have seen so far, fully ray traced games aren't really going to happen anytime soon i think.

fehu · Jul 18, 2020

Ronaldo8 said:
Dynamic presets to be determined by the developer would have only served to underline how much of a hassle this paradigm actually is. Much easier to code against fixed budgets for the GPU and the CPU.

I don't feel like agreeing with you and anyone that thinks this, just because Microsoft has already done it with the cpu presets more thread / more frequency, and anyone is fine with it. to the point that nobody talks about it and slept out of mind.
Maybe next-next generation will allow *choosing* between balanced / send all dilithium energy to the gpu.
The only reason that we are still talking about Sony's implementation is that still it's not clear how it will auto-manage the frequency balancing.

iroboto · Jul 18, 2020

DSoup said:
Apparently they can given Sony are aiming to ship 10m consoles by early 2021. So evidently not a problem.

And yet, we've heard zero noise, rumour or suggestion about developing for PS5, only gushing happiness at the new architecture. Many developers have been used to developing for variable clock hardware and variable performance profiles across literally hundreds/thousands of varying performance profiles because this has been the PC CPU and GPU for years.

Mark Cerny said workload, activity is different. The whole GPU may may be active but the workload may be light because of lots of use of 32-bit, 64-bit or 128-bit instructions and data. Equally parts of the GPU may be inactive but the workload may be heavy because of a lot of use 256-bit instructions and data - which was a scenario Mark Cerny mentioned. The workload determines the power draw, not the level of GPU's level of activity. It's a subtle, if arguably near-semantic, difference.

activity level for a specific chip is how many transitions are flipping states. So 1S > 0s and vice versa. When bits flip there is much more power draw. Usually this should be associated with workload, but not always. Larger instruction sets flip way more bits with less instructions which is why you’re seeing so much more power draw. You go from adding 2x32 to adding 8x32 in a single shot across all cores. You’re going to get massive activity across more transistors across more cores. That’s an easy way to see how activity level scales on a CPU.

Deleted member 11852 · Jul 18, 2020

iroboto said:
activity level for a specific chip is how many transitions are flipping states. So 1S > 0s and vice versa. When bits flip there is much more power draw. Usually this should be associated with workload, but not always.

Before I write a page an a half here, let me ask you a question. Do you believe that all FinFET transistors across the APU die are equal in terms of use and power draw?

iroboto · Jul 18, 2020

DSoup said:
Before I write a page an a half here, let me ask you a question. Do you believe that all FinFET transistors across the APU die are equal in terms of use and power draw?

certainly not in terms of use, transistors in chip are not being used equally all over. Some will definitely be used all the time as per its use for most functions that people require of the chip, and others less (functions less called, operations less called). As for power draw no, generally speaking they should operate within a close tolerance of each other, but you're going to get some differences when spread over billions of transistors.

I'm not referring to idle power states when referring to activity.

Deleted member 11852 · Jul 18, 2020

iroboto said:
As for power draw no, generally speaking they should operate within a close tolerance of each other, but you're going to get some differences when spread over billions of transistors.

This isn't the case. Ignoring FinFET memory transistors, there are different types of FinFET logic gates which can be optimised for performance (and use considerably more power) and others can for lower leakage (and use considerably less power). This is why state flips are not useful for determining power draw.

iroboto · Jul 18, 2020

DSoup said:
This isn't the case. Ignoring FinFET memory transistors, there are different types of FinFET logic gates which can be optimised for performance (and use considerably more power) and others can for lower leakage (and use considerably less power). This is why state flips are not useful for determining power draw.

But from a simplistic point of view it's what we need for dynamic power equations. It would be fairly challenging to broadly talk about chips, without benchmarking.

Deleted member 11852 · Jul 18, 2020

iroboto said:
But from a simplistic point of view it's what we need for dynamic power equations. It would be fairly challenging to broadly talk about chips, without benchmarking.

True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:

An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.

This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.

I guess you do need a smart system if you are going to steal power from CPU or GPU, you have to understand the consequences otherwise it could be problematic.

iroboto · Jul 18, 2020

DSoup said:
True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:

An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.

This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.

fair enough, I see your POV. Likely monitoring it's instructions that are coming in, which I guess in turn they would know the power levels for each type of instruction.

3dilettante · Jul 18, 2020

Allandor said:
adaptive clock and power rates also have other problems. This can make production a bit more complicated. Every GPU & CPU combination must reach their frequencies at the desired power target. And at those high frequencies (of the GPU) I really doubt that many chips can make it. Only if those peak clocks are only really short bursts.

Every unit in the SOC needs to meet its performance target under the design's max transient power limit and the TDP. The parameters of the power delivery system and cooler would give limits to the acceptability of silicon, but the lower points would tend to be less extreme in their scaling than the points at the edge of the safety margin. Cerny made a reference to clock/power points intended to match the thermal density of the GPU and CPU sections, although I'm not clear on why that was emphasized given there seems to be no evidence of any other AMD products needing that, and they can experience more significant swings than the PS5's described method can.
On the other hand, such a method could be simpler than the usual AMD production method, where the validation suites would be testing many more DVFS points and transition combinations than the PS5's design requires. Whatever the PS5 DVFS points are, the described system is consistent with using AMD's standard DVFS in a less challenging way than other consumer products.

Allandor said:
That is where I see the problem. You can increase load of the GPU (which doesn't say about how stressed it really is) but not every calculation will max everything out. Different calculations lead to different internal load, even though the GPU can't do anything more. Even if one game creates 100% usage of the GPU at 80W (just as an example) another game might just max the GPU out at 80% but already use 150W of power. This is what makes it much more complicated for productions. More or less every CPU & GPU combination must be tested more or less with most possible load-testing and each time it must reach the same (fixed) frequencies with a fixed power target.

The validation process for the PS5 seems more complex than it was for the PS4. However, in terms of manufacturing it looks to me like it's within the limits of what AMD does routinely since there's a version of this DVFS in virtually every chip it makes.
The system itself is using a model that is conservative in terms of what it calculates as a worst-case output, but the dynamic estimate is significantly closer to reality than the prior generation's design-time guard banding. The estimates the PS5 uses should be more conservative since every chip needs to meet the platform's model SOC standards, whereas AMD's many product bins and high-clocking SKUs can tweak parameters and make assumptions about silicon quality the console cannot.

I doubt that many chips will make it through that binning process.

It's an apparently single-binned console SOC being built in the millions. For practical purposes, it is very important that most do. The CPU portion is significantly below the design max of the Zen 2 core, so I think that element is unlikely to be an obstacle. The GPU max clock is unusually high for prior GPU generations, but it seems reasonable that a pipeline specifically tailored for a higher clock target can hit a max clock that is modestly higher than the peak clocks of some RDNA products, especially since it doesn't need to be sustained.
Whether taking the GPU clocks to this level will be the winning design philosophy remains to be seen, but it seems to me that it should at least be producible.

DSoup said:
True, and we don't have much on this other this statement for Digital Foundry's Road to PS5 analysis piece:

An internal monitor analyses workloads on both CPU and GPU and adjusts frequencies to match. While it's true that every piece of silicon has slightly different temperature and power characteristics, the monitor bases its determinations on the behaviour of what Cerny calls a 'model SoC' (system on chip) - a standard reference point for every PlayStation 5 that will be produced.

This is why I picked up on the workload/activity thing because whatever this internal monitor is predicated on, it is workload rather than activity. What does this mean? Is there logic in PS5 profiling GPU/CPU/API workloads in realtime to make adjustments power distribution? ¯\_(ツ)_/¯.

AMDs DVFS has been described in other products as using activity monitors for functional elements of the pipeline. Later proposals and patents also included things like small blocks of redundant processing hardware that served as representative elements for the behavior of the most demanding silicon, such as dummy ALUs and registers running operations intended to give a worst-case figure for electrical and thermal performance. Then there's a significant number of thermal sensors and current monitors.
The on-die voltage management and Vdroop protection indicate the hardware can manage and detect current and voltage changes at the microsecond or nanosecond scale. The activity monitors and thermal estimates work to gauge power consumption and die temperatures at microseconds up to a millisecond range, going by the power management described for various GPUs and Zen.
I think AMD's described token-based power trading between chips or chip regions before, which may go into what SmartShift can rely upon for determining how much slack is left in the power budget.

What the PS5 appears to be doing is taking all of this DVFS hardware, backing away from the highest CPU clock ranges, and picking a more conservative and fixed set of figures for the per-chip power model.

Ronaldo8 · Jul 18, 2020

fehu said:
I don't feel like agreeing with you and anyone that thinks this, just because Microsoft has already done it with the cpu presets more thread / more frequency, and anyone is fine with it. to the point that nobody talks about it and slept out of mind.
Maybe next-next generation will allow *choosing* between balanced / send all dilithium energy to the gpu.
The only reason that we are still talking about Sony's implementation is that still it's not clear how it will auto-manage the frequency balancing.

The 8 thread vs 16 thread preset is chosen on a per title basis and anyway still does not imply a common power envelope with the GPU. So, not for the first time, I've got no inkling of what you are talking about.

Kreten · Jul 20, 2020

DSoup said:
The only figures that Cerny threw out was stating that reducing power consumption by 10% took a couple of percent reduction in clockspeed. At 3.5Ghz (CPU), that is 70Mhz and at 2.3Ghz (GPU) that is 46Mhz. Is 10% the cap?

What really confuses me is what he said that they were not able to keep the GPU stable at 2.0GHz using traditional method. So if clock speeds are only reduced 50MHz the system is still at 2.25GHz and well over the point that they couldn’t keep stable.

Is anyone able to explain a reason to this?

chris1515 said:
https://www.ign.com/articles/deathloop-devs-on-ps5-features-time-loop-gameplay

DeathLoop 4k 60 fps, they use some raytracing on PS5 and dualsense features. Not bad

I didn’t see a mention of resolution there, where does 4k 60 come from?

chris1515 · Jul 20, 2020

Kreten said:
I didn’t see a mention of resolution there, where does 4k 60 come from?

https://bethesda.net/en/article/7u3fdVVW7wfC5fhNyoeU2n/deathloop-gameplay-reveal-and-next-gen-details

From a blog post of bethesda just after the PS5 reveal event

DEATHLOOP is a uniquely Arkane take on the first-person shooter genre, and it is being developed for a new generation of hardware. DEATHLOOP will launch on console exclusively for PlayStation 5 this holiday season and will run at 4K/60FPS at launch. DEATHLOOP will also be launching on PC at the same time.

iroboto · Jul 20, 2020

Kreten said:
What really confuses me is what he said that they were not able to keep the GPU stable at 2.0GHz using traditional method. So if clock speeds are only reduced 50MHz the system is still at 2.25GHz and well over the point that they couldn’t keep stable.

Is anyone able to explain a reason to this?

Cerny said that they weren't able obtain 2.0 GHz with fixed clocks, not that the clocks weren't stable.
The challenge with fixed clocks is that the frequency is never allowed to go down, so as activity level continues to increase the power draw must increase to match it as well. Which means that it must be able to survive worst case scenarios with respect to activity levels. You may also encounter some yield issues as you set a very high fixed clock because all your chips must be able to withstand the torture of running high power with high frequencies. Your cooling and power system must be matched for it.

So with respect to looking at that entire system, they were unable to achieve 2.0 GHz.

Variable clocks allows them to step around those issues, if the activity level spikes the power draw high, the system can temporarily drop the frequency and the chip will still be able to continue. You no longer need to worry as much about the absolute worst case torture test because the system can continually down clock and keep within the parameters of cooling and power.

The setup they chose could reduce the yield because of fixed power draw and the requirement that all chips must be able to hit the 2230Mhz mark and hold it as per their workload rules. But as others have suggested Sony shouldn't have chosen a clockspeed that they could not have produced in fairly decent quantities.

Kreten · Jul 20, 2020

iroboto said:
Cerny said that they weren't able obtain 2.0 GHz with fixed clocks, not that the clocks weren't stable.
The challenge with fixed clocks is that the frequency is never allowed to go down, so as activity level continues to increase the power draw must increase to match it as well. Which means that it must be able to survive worst case scenarios with respect to activity levels. You may also encounter some yield issues as you set a very high fixed clock because all your chips must be able to withstand the torture of running high power with high frequencies. Your cooling and power system must be matched for it.

So with respect to looking at that entire system, they were unable to achieve 2.0 GHz.

Variable clocks allows them to step around those issues, if the activity level spikes the power draw high, the system can temporarily drop the frequency and the chip will still be able to continue. You no longer need to worry as much about the absolute worst case torture test because the system can continually down clock and keep within the parameters of cooling and power.

The setup they chose could reduce the yield because of fixed power draw and the requirement that all chips must be able to hit the 2230Mhz mark and hold it as per their workload rules. But as others have suggested Sony shouldn't have chosen a clockspeed that they could not have produced in fairly decent quantities.

Ok so with Fixed clock power is always at 2.0GHz and it doesn’t work, but with their power shift it can go from 2.23GHz to (single digit percentage drop max of 9%) 2.03GHz? In this case it still means that the GPU is operating the entire time at frequency higher than 2.0GHz. Unless the GPU is allowed to swing much more and scale with workloads?

Also while they don’t want ambient temperatures to affect the performance of the chip, it still must have some type of thermal protection in case it can’t get enough airflow or something. It would probably just shut down with an error and not just 100% ignore temperatures of the chip and possibly damage it.

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Deleted member 11852

Guest

PSman1700

chris1515

cheapchips

chris1515

PSman1700

fehu

iroboto

Daft Funk

Deleted member 11852

Guest

iroboto

Daft Funk

Deleted member 11852

Guest

iroboto

Daft Funk

Deleted member 11852

Guest

iroboto

Daft Funk

3dilettante

Ronaldo8

Kreten

chris1515

iroboto

Daft Funk

Kreten

Similar threads