Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

function · Aug 18, 2020

Metal_Spirit said:
I'm sorry. I'm not getting it.
Regardless of what things mean in your analogy, what makes you think PS5 has no power for full frequency full activity, but Xbox does?

Cerny explicitly stated that there will be conditions where the PS5 cannot maintain full frequency due to power. He only said that he expected it to be "at or close to" full frequency "most of the time".

MS operate with fixed clocks, but that means on GPU they're a long way down the frequency / power curve from Sony's peak boost. They have two different approaches.

As an example of when you might not want to maintain full clocks on the CPU due to power draw, Cerny talked about AVX. Last night MS confirmed in a Q&A after their Hotchips presentation that they can maintain max frequency (3.8 gHz with 1T/C) even when using AVX.

manux · Aug 18, 2020

MS has one difference in clocks. CPU clock is slightly different if SMT is on/off.

It would be pretty difficult to load 8core/16thread cpu to max power consumption all the time in a game. Similarly it's pretty difficult to fully powerload gpu, well unless you are playing furmark. The move for sony 1st party studios might be to interleave work so that when gpu is not fully utilized run power heavy cpu algorithms and opposite. Shouldn't be any more difficult than async compute and filling in the bubbles with some compute work. Sony approach puts more pressure in sw side but that has been long time coming. Transistors not becoming cheaper and faster like they used to will force inventing efficiency improvements to keep the improvements going at faster pace.

iroboto · Aug 18, 2020

Metal_Spirit said:
I'm sorry. I'm not getting it.
Regardless of what things mean in your analogy, what makes you think PS5 has no power for full frequency full activity, but Xbox does?

It's alright, I don't know anything either. It's not a problem, it's important that I teach a topic to really know if I understand it or not (how much or how little).
Firstly, my analogies is not a comparison of which one is better, it's just how the 2 behave differently.

Secondly, because that is the difference between fixed clocks and variable clocks.
With fixed clocks, you go further away from the peak performance to find a specific clockspeed that despite whatever load is thrown at it, it will be able to handle it at that clockspeed.
With variable clocks, you can reach a very high peak performance, but once the load starts coming into play it must slow down to accommodate it.

An analogy for this is that, PS5 is a road with no speed limits, the only speed limit is the car itself. When there is only 1 car on that empty road it can go as fast as it possibly can. When there are a few cars on the road, most of the time the cars are still going at it's fastest speed, but perhaps every once in a while they need to slow down to not crash into each other. And once there are too many cars on the road, there is heavy congestion and all the cars must slow down to avoid crashing. Then the congestion will go away and the cars can go back up to going as fast as they can. When it approaches a point in time when there are just too many cars on the road the cars will just go really slow, like a traffic jam. A crash should never occur unless one of the cars is poorly made than the rest and just sort of dies from defect.

The analogy for Xbox is that there is a multilane highway. It's very wide, but there is a speed limit. All the cars cannot exceed the speed limit, but the highway is very wide so it can accommodate a lot of cars without needing to reduce speed. But should there be a day that arises where there are just too many cars, the cars will crash. (xbox will shut down).

So lets take a look at how the example plays out vs a clockspeed game graph cited earlier. Before you look at the graph just note that this is a PC graph and will differ to how PS5 would handle this.
PC are bound by power limits and thermal limits. The power limit is based on the absolute amount a chip can handle, there are limits to how much power we can give silicon before it is destroyed, no amount of cooling will save it.
Aside from that PC is allowed to give it as much power as the chip can handle physically. This power limit determines how much room that the GPU has to work with in terms of either going really fast and doing little work, or going slower and doing lots of work. The thermal limit determines how much we must slow down the chip if it's continually getting hotter and hotter, so if cooling is insufficient it must cool the chip down to avoid damage, and often as a chip heats up, it requires more power to keep the frequency higher but this is a distraction topic, so I won't cover it.

So that means, this radeon 5700XT THICC II Ultra is a highly overclocked (given the maximum juice possible on the best yield of silicon) combined with heavy cooling to maintain to the best of its ability the maximum clockspeed in as many loads as possible. As you can see in the graph below, even when bound only by temperature, the clock speed is quite variable. This is due to the amount of workload coming in and out at all times, the instant power draw of the GPU as the workload increases. This is why we see the dips in frequency, the power is pulling the card in different ways, so despite it not being a thermal issue, the frequency must fluctuate based on load. In this case, we see that there are no cars on the road and we see peaks greater than 1970Mhz but once congestion comes into play, we see it around 1930 Mhz. Every once in a while we see dead lock stalled traffic, these are the outliers. You see it drop as low as 1870. Once again, this is not thermal, thermals happen slowly over time, you'd see the graph slowly dip downwards towards the end, what this is, is a lot of heavy load on the GPU to produce a result. I can't tell you why it went so low, but only that it is normal behaviour for a variable setup to run into something that is very taxing and once it's cleared up, it can go back to the top.

As for the language about _most of the time it's near or at the top_. This graph represents this quite well. The concentration of the graph is between a small band between 1970 and 1930 and as you can see, that band is near the top. But even with unlimited power and the best cooling there is still swings in frequency and major dips where 100Mhz is loss.

I largely expect PS5 to look somewhat like this, except because it's a piece of consumer grade hardware, it has power limits and thermal limits to keep costs down. So to save money, they must select less than ideal silicon, they must select less than overkill cooling, and they must select less power than the maximum (and the fact that they also must share power with the CPU among other things). This means their variable clock rate must be accommodating to a larger variety of chips. So the concentration of points in the band above I expect to be a wider than what we see here and of course there will still be outliers that will drop the frequency much below.

There is nothing wrong with this method, it is the modern approach to maximizing the performance out of our silicon and that has never been at the heart of the debate here.

What I've been trying to get across is that the graph will not look like this:

This would not be possible. This is just about as closed to fixed frequency as you can get.
There is no real advantage to fixed frequencies except for how consistent the power is at all moments. It's easier to optimize for when pushing the limits of performance but aside from that, the drawback as that you leave a lot of power on the table that could have been extracted through a variable clocking method. The way to optimize for consistent performance with variable clock frequencies is to just aim at the bottom of the 'most of the time' band for performance, and you run with all the gains you get from unlocked frequencies. So ie, fixed clocks would have to be 1830 in the above graph and developers will optimize to the edge of 1830. With variable clock rate, you can maximize for performance around 1920Mhz, and the game will operate above that and any dips of performance won't impact any frame timing. But you still gained an extra 90-100Mhz over the fixed frequency setup.

I have no doubts that the variable clocking method provided by Sony pushes their chip further. But what is being contested is whether it looks like the top graph or the bottom graph. And it's likely to look like the top graph but with a much wider band. If it didn't, and it was likely able to hold like the bottom graph, you'd run into power and cooling issues.

A 10% movement in clock frequency to accommodate the varying workloads is a reasonable number to choose. So the PS5 operating between 2.23 and 2.0 is quite a reasonably high and good margin, most cards operate boost like this. Just wanted to address why it would not likely perform "most of the time and near the top" as being between 2200 and 2230.

Allandor · Aug 18, 2020

iroboto said:
It's alright, I don't know anything either. It's not a problem, it's important that I teach a topic to really know if I understand it or not (how much or how little).
Firstly, my analogies is not a comparison of which one is better, it's just how the 2 behave differently.

Secondly, because that is the difference between fixed clocks and variable clocks.
With fixed clocks, you go further away from the peak performance to find a specific clockspeed that despite whatever load is thrown at it, it will be able to handle it at that clockspeed.
With variable clocks, you can reach a very high peak performance, but once the load starts coming into play it must slow down to accommodate it.

An analogy for this is that, PS5 is a road with no speed limits, the only speed limit is the car itself. When there is only 1 car on that empty road it can go as fast as it possibly can. When there are a few cars on the road, most of the time the cars are still going at it's fastest speed, but perhaps every once in a while they need to slow down to not crash into each other. And once there are too many cars on the road, there is heavy congestion and all the cars must slow down to avoid crashing. Then the congestion will go away and the cars can go back up to going as fast as they can. When it approaches a point in time when there are just too many cars on the road the cars will just go really slow, like a traffic jam. A crash should never occur unless one of the cars is poorly made than the rest and just sort of dies from defect.

The analogy for Xbox is that there is a multilane highway. It's very wide, but there is a speed limit. All the cars cannot exceed the speed limit, but the highway is very wide so it can accommodate a lot of cars without needing to reduce speed. But should there be a day that arises where there are just too many cars, the cars will crash. (xbox will shut down).

So lets take a look at how the example plays out vs a clockspeed game graph cited earlier. Before you look at the graph just note that this is a PC graph and will differ to how PS5 would handle this.
PC are bound by power limits and thermal limits. The power limit is based on the absolute amount a chip can handle, there are limits to how much power we can give silicon before it is destroyed, no amount of cooling will save it.
Aside from that PC is allowed to give it as much power as the chip can handle physically. This power limit determines how much room that the GPU has to work with in terms of either going really fast and doing little work, or going slower and doing lots of work. The thermal limit determines how much we must slow down the chip if it's continually getting hotter and hotter, so if cooling is insufficient it must cool the chip down to avoid damage, and often as a chip heats up, it requires more power to keep the frequency higher but this is a distraction topic, so I won't cover it.

So that means, this radeon 5700XT THICC II Ultra is a highly overclocked (given the maximum juice possible on the best yield of silicon) combined with heavy cooling to maintain to the best of its ability the maximum clockspeed in as many loads as possible. As you can see in the graph below, even when bound only by temperature, the clock speed is quite variable. This is due to the amount of workload coming in and out at all times, the instant power draw of the GPU as the workload increases. This is why we see the dips in frequency, the power is pulling the card in different ways, so despite it not being a thermal issue, the frequency must fluctuate based on load. In this case, we see that there are no cars on the road and we see peaks greater than 1970Mhz but once congestion comes into play, we see it around 1930 Mhz. Every once in a while we see dead lock stalled traffic, these are the outliers. You see it drop as low as 1870. Once again, this is not thermal, thermals happen slowly over time, you'd see the graph slowly dip downwards towards the end, what this is, is a lot of heavy load on the GPU to produce a result. I can't tell you why it went so low, but only that it is normal behaviour for a variable setup to run into something that is very taxing and once it's cleared up, it can go back to the top.

As for the language about _most of the time it's near or at the top_. This graph represents this quite well. The concentration of the graph is between a small band between 1970 and 1930 and as you can see, that band is near the top. But even with unlimited power and the best cooling there is still swings in frequency and major dips where 100Mhz is loss.

I largely expect PS5 to look somewhat like this, except because it's a piece of consumer grade hardware, it has power limits and thermal limits to keep costs down. So to save money, they must select less than ideal silicon, they must select less than overkill cooling, and they must select less power than the maximum (and the fact that they also must share power with the CPU among other things). This means their variable clock rate must be accommodating to a larger variety of chips. So the concentration of points in the band above I expect to be a wider than what we see here and of course there will still be outliers that will drop the frequency much below.

There is nothing wrong with this method, it is the modern approach to maximizing the performance out of our silicon and that has never been at the heart of the debate here.

What I've been trying to get across is that the graph will not look like this:

This would not be possible. This is just about as closed to fixed frequency as you can get.
There is no real advantage to fixed frequencies except for how consistent the power is at all moments. It's easier to optimize for when pushing the limits of performance but aside from that, the drawback as that you leave a lot of power on the table that could have been extracted through a variable clocking method. The way to optimize for consistent performance with variable clock frequencies is to just aim at the bottom of the 'most of the time' band for performance, and you run with all the gains you get from unlocked frequencies. So ie, fixed clocks would have to be 1830 in the above graph and developers will optimize to the edge of 1830. With variable clock rate, you can maximize for performance around 1920Mhz, and the game will operate above that and any dips of performance won't impact any frame timing. But you still gained an extra 90-100Mhz over the fixed frequency setup.

I have no doubts that the variable clocking method provided by Sony pushes their chip further. But what is being contested is whether it looks like the top graph or the bottom graph. And it's likely to look like the top graph but with a much wider band. If it didn't, and it was likely able to hold like the bottom graph, you'd run into power and cooling issues.

A 10% movement in clock frequency to accommodate the varying workloads is a reasonable number to choose. So the PS5 operating between 2.23 and 2.0 is quite a reasonably high and good margin, most cards operate boost like this. Just wanted to address why it would not likely perform "most of the time and near the top" as being between 2200 and 2230.

But the problem with such an approach is, you get high average framerates, but the 1% lows will be worse. It would be really great to know how deep the PS5 GPU can downclock itself. If you target as a developer you would than optimize for the most common case somewhere between the lowest and the best point.

btw, I don't think that the xbox GPU will be underutilized. At least when VRR is a common feature you can squeeze out every bit of performance you can get out of the GPU. Same applies for the PS5.
With what we know I would guess that DF will find much more Framedrops in PS5 titles because of overoptimistic developer assumptions

Deleted member 11852 · Aug 18, 2020

iroboto said:
No I'm not. Even Cerny is referencing the same thing:

But nobody know's Sony's model, which could easily account for the different in transistors types fo transistors in their chips and the varying power draw.

iroboto said:
Unless you have some material for me to read for me to understand your point better, I have really never understood any of your points. I've just largely not responded to them because I don't know what I'm responding to or where to even begin. You've never cited a resource for me to read.

TSMC have reference documentation for hardware partners to illustrate the the relative differences in transistor types across all of their commercially available lines. I assumed it was taken as read that transistors are not all equal, i.e you know that a gate and a memory transistor are completely different in terms of both material and power draw?

BoardBonobo · Aug 18, 2020

So does this mean that the XSX has a higher chance of hitting thermal issues and will consume more power? The PS5 seems built so that every game will play the same regardless of temperature, it's a far more elegant design than the monolithic design of the XSX.

Nisaaru · Aug 18, 2020

I'm really wondering how the PS5 will perform during the Summer with 20 celsius or more background temperature than usual.

Rurouni · Aug 18, 2020

Nisaaru said:
I'm really wondering how the PS5 will perform during the Summer with 20 celsius or more background temperature than usual.

Doesn't really matter. The downclocking is based on power draw, not temperature. Basically they have a combined max power budget for the CPU and GPU. For example, the GPU max power draw can be 160w and the CPU can be 65w, but the combined power must not exceed 200w. This is different than clock speed. You can have something clocked at maximum but still using less power because it only being utilized 50% than something at 2/3 the clock but with 100% utilization. So it is possible for the PS5 to run at max clock for most of the time. But when stressed, something must give.
In this example, Sony would already have a cooling system that works for 200w chip for any given ambient temperature.
*also there can be power draw variance even with 100% utilization. An example is that when using something like a power virus (Prime95, Furmark, etc), they are utilizing the CPU/GPU at 100%, but I also can use something like Blender or Coronarenderer (for 3d rendering) and it will utilize the CPU at 100%, but when I use Prime95, the CPU can be 5 to 10c higher than using Coronarender, thus not every 100% utilization is the same.
Sony do it like that so every console will have the same experience regardless of your ambient temperature. You can play PS5 in the north pole and it will perform the same as someone playing it in Egypt.

chris1515 · Aug 18, 2020

Not bad and this is a tiny team.

Xbat · Aug 18, 2020

Nisaaru said:
I'm really wondering how the PS5 will perform during the Summer with 20 celsius or more background temperature than usual.

20 Celsius is our winter

iroboto · Aug 18, 2020

DSoup said:
TSMC have reference documentation for hardware partners to illustrate the the relative differences in transistor types across all of their commercially available lines. I assumed it was taken as read that transistors are not all equal, i.e you know that a gate and a memory transistor are completely different in terms of both material and power draw?

That's covered by C in the equation - where C is the capacitance loading of the transistors you're loading.

When you're comparing the same chip against itself, you can cross C off from the equation.

Pete · Aug 18, 2020

I like the analogies, but can’t we also summarize the two clock philosophies thusly:

MS: Developers, developers, developers!
Sony: Developers! Git gud.

Or, XSX is a stationary log that developers have to cross, and PS5 is ... rolling slightly? Varying width? Either way, we’re crossing the generational divide.

BoardBonobo said:
So does this mean that the XSX has a higher chance of hitting thermal issues and will consume more power? The PS5 seems built so that every game will play the same regardless of temperature, it's a far more elegant design than the monolithic design of the XSX.

It might draw more power because it’s capable of more TFlOps, but it’s also clocked lower. I doubt either will hit thermal limits, wihin reason (same conditions: ambient temp, vents and HSF not clogged with dust, not in direct sunlight). Fan noise might be different.

iroboto · Aug 18, 2020

BoardBonobo said:
So does this mean that the XSX has a higher chance of hitting thermal issues and will consume more power? The PS5 seems built so that every game will play the same regardless of temperature, it's a far more elegant design than the monolithic design of the XSX.

The architecture of XSX would lend it to leading to critical path to failure (shutdown). Whereas that critical path to failure for PS5 would be more defect based (it should never have enough power to cause a shutdown since the power is capped)
The chances of hitting thermal issues and power consumption will come down to build quality ultimately. And build quality is a function of cost.

Nisaaru · Aug 18, 2020

Xbat said:
20 Celsius is our winter

I meant the differential. Like from 20 to 40 or so.

t0mb3rt · Aug 18, 2020

Allandor said:
They already said that there is additional audio hardware like in every xbox before. But don't expect wonders about audio hardware. I really don't know why cerny made such a fuzz about that. Audio acceleration is really really old and there is not that much more you can do to make it better, because it is heavily depended on your audio setup. To many sound sources -> to much noise, so there is a limit of what you can do with it.
But even though consoles at least since xb360 and ps3 had both audio hardware on board, most times the teams use the cpu for it. It is not very cpu intensive what is done in games with the audio.

How does the XSX compare to the PS5 in terms of audio hardware?

Deleted member 11852 · Aug 18, 2020

iroboto said:
That's covered by C in the equation - where C is the capacitance loading of the transistors you're loading. When you're comparing the same chip against itself, you can cross C off from the equation.

Equations are fine to generalise physics but not when it comes to a hundred million variables expressed as near constants.

BRiT · Aug 18, 2020

They're still better than having a discussion based solely on phrases that are extremely vague.

Deleted member 11852 · Aug 18, 2020

BRiT said:
They're still better than having a discussion based solely on phrases that are extremely vague.

As an engineer, I strong disagree. The reasons modern equations run to pages and pages long to accommodate all of complexities of increasing varied engineering models is because they are necessary but we've fallen to extremely basic equations to express very complex semiconductor designs. which are grossly misleading.

iroboto · Aug 18, 2020

DSoup said:
Equations are fine to generalise physics but not when it comes to a hundred million variables expressed as near constants.

We aren't looking for a general number here. We looking for the general shape of things.
In any polynomial math equation, we look at the variable with the highest power. Because that variable will have the greatest effect on the shape itself.
This lets us know what is happening and to talk about it in general sense without needing to get into the discussion.

In the same way we can calculate the velocity of an object that has fallen from 100m high. Simple physics equations will not account for air drag, terminal velocity, and all sorts of other factors that could be affecting the speed.
But it still gives a fairly good idea of what to expect. I have not used this equation to prove the exact numbers, but to give a general idea of where these processors could operate given a specific frequency. I did not tell you what the wattage is, only the relative wattage. If the only thing you're doing is running a chip at higher and lower frequency with no other changes and measure the outputs we can over time generate the graph of voltage and frequency. It is likely to still be cubic even if it did not follow the cubic formula perfectly.

iroboto · Aug 18, 2020

DSoup said:
As an engineer, I strong disagree. The reasons modern equations run to pages and pages long to accommodate all of complexities of increasing varied engineering models is because they are necessary but we've fallen to extremely basic equations to express very complex semiconductor designs. which are grossly misleading.

I would disagree with that sentiment, all of our physics equations are still largely useful for everyday life despite the growing field of quantum physics. Many electrical engineers will never have an idea of super symmetry, but can easily wire up massive conduits and figure out what cables will be needed for which amps etc.

We are not doing research and development here. Ohms' law still applies.
Science is built on science. HPC computing courses will still teach dynamic power equations to give us a general ballpark idea of voltage requirements for CMOS chips.

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

function

None functional

manux

iroboto

Daft Funk

Allandor

Deleted member 11852

Guest

BoardBonobo

My hat is white(ish)!

Nisaaru

Rurouni

chris1515

Xbat

iroboto

Daft Funk

Pete

Moderate Nuisance

iroboto

Daft Funk

Nisaaru

t0mb3rt

Deleted member 11852

Guest

BRiT

(>• •)>⌐■-■ (⌐■-■)

Deleted member 11852

Guest

iroboto

Daft Funk

iroboto

Daft Funk

Similar threads