Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

KeanuReeves · Mar 31, 2020

Rockster said:
Cerny said they attempt to estimate for the worst case "game", not some theoretical or unrealistic possibility.

But that doesn't mean that they AREN'T doing that still? I mean they need to have a console that works in high temperature climates as well, and that means they need to do stress tests. You can't tell me that isn't standard procedure. The main difference now with PS5 is that the power level is consistent at all times.

Rockster said:
You are trying to read into his comments the notion that no applications realistically hit that upper power band which is simply not true.

I am not trying to say that no game is going to hit the power limits, I am just laying out what the benefits are and why it will likely stay close to the peak frequency.

Rockster said:
And selectively quoting scenarios such as a map screen or the use of AVX instructions as the only reasons for incurring high power draw.

Because those are direct quotes from the guy who likely knows the most about the PS5 on the planet. What is the point of disregarding those points? I am not saying these are the only ones, which is why I also generalized and said "powerhungry workloads".
If HZD is able to deliver dense geometry of that level during gameplay and be relatively cool compared to the map screen then it goes without saying that this has significant performance implications.

Rockster said:
Again, however, this doesn't change the fact that their acoustic and power design targets ARE going to limit overall system performance as games push greater utilization of the hardware and it reduces clock speeds to compensate as designed. His expectation is simply that "most" games, whatever the definition of that is, aren't going to push that hard and as such run at or near the max clocks often.

I absolutely agree. I am not trying to say that the performance is unlimited. But remember you started this conversation by saying that it didn't make sense why they would on one hand struggle to reach 2GHz/3GHz but now they can reach 2.23GHz/3.5GHz. I sure hope now you understand why it makes sense. Because it absolutely DOES make sense. It has huge performance benefits for the typical game and its utilization of the hardware. And those benefits are now optional for those devs who wants to push the hardware just like God of War, HZD or any of the other games that made the PS4 and PS4Pro life off into orbit.

Metal_Spirit · Mar 31, 2020

iroboto said:
There are multiple forum posts and videos going around the interwebs about this with your exact axioms. I figured I would address the claim that PS5 had more bandwidth than XSX through averaging. The idea of a slow and fast pool of memory is equally aggravating. Its the same damn clockrate and bus width. 6 chips take longer to fill up than the other 4. There is no fast and slow though.

Sorry though, didn't mean to imply.

Off course there could be. My axioms are based on plain facts.
But I never intended to claim PS5 could be superior in bandwidth, just stating that this could affect the X performance.
But thank you for your care!

Jay · Apr 1, 2020

A fair few Microsoft game stack videos have gone up on YouTube.
Saying in here as a few of them will be of interest to all sides.

mrcorbo · Apr 1, 2020

iroboto said:
There is no fast and slow though.

There kind of is, though, in the form of data spread across 10 or 6 chips.

Proelite · Apr 1, 2020

Jay said:
A fair few Microsoft game stack videos have gone up on YouTube.
Saying in here as a few of them will be of interest to all sides.

Nvm useless video

BRiT · Apr 1, 2020

Jay said:
A fair few Microsoft game stack videos have gone up on YouTube.
Saying in here as a few of them will be of interest to all sides.

Please post additional videos in the Game Stack thread (too), thanks: https://forum.beyond3d.com/threads/microsoft-game-stack-2020-03-17-and-18.61620/

KeanuReeves · Apr 1, 2020

Metal_Spirit said:
On both systems the GPU gets what remains after the CPU usage.
The question is that on series X for the GPU to use to the full the bandwidth that remains, he must have to be constantly reading from both pools. Otherwise Bandwidth usage will be sub-optimal.

Im not sure I understand. If you don't mind me asking, how does that work?
Edit: I can see what you replied to other people with. I'll leave it be

Silent_Buddha · Apr 1, 2020

KeanuReeves said:
You are ignoring why they were struggling to keep 2GHz and 3GHz. Remember they stress test their systems in extreme environments with extreme/unrealistic workloads.
Those extra powerhungry workloads? Yeah, those are outlier workloads that disproportionately bog down the system.
In order to manage those outlier workloads, they test the system using them at peak theoretical capacity, and that was the scenario they were having trouble keeping the clocks above 2GHz and 3GHz.
And this was a problem because those workloads are seemingly not used extensively anyway. Likely no game is going to use those instructions at peak theoretical numbers, so why cater so strongly to that scenario?
IF the devs choose to use those workloads in the usual small numbers, it will not bog down performance all that much, just a few frequency points.
It is much preferable to do this than to lock the clocks at something way lower that ultimately only serves the purpose of dragging down performance of typical workloads.

So you are saying that...

Firstly, Sony were having trouble keeping both the GPU and CPU at 2 GHz and 3 GHz respectively.
Now, Sony have no trouble keeping the GPU at 2.2+ GHz and CPU at 3.5 GHz...at the same time.

???

So, it's easier to maintain higher frequencies than lower frequencies?

The developers just have to avoid pushing the CPU or GPU too hard... Erm...

So, later in the generation when developers are more familiar with the hardware and capable of pushing it more, the PS5 won't be able to be pushed that hard because you shouldn't push the GPU or CPU too hard or you might have to downclock either the GPU or CPU.

So many people on both sides trying to say that X is the way it is done and there is absolutely no way that it could be doing Y when Cerny was pretty vague on a lot of things.

I'm happy to wait until final hardware is being sold and see how it performs.

Regards,
SB

KeanuReeves · Apr 1, 2020

Silent_Buddha said:
So you are saying that...

Firstly, Sony were having trouble keeping both the GPU and CPU at 2 GHz and 3 GHz respectively.

Now, Sony have no trouble keeping the GPU at 2.2+ GHz and CPU at 3.5 GHz...at the same time.

???

So, it's easier to maintain higher frequencies than lower frequencies? The developers just have to avoid pushing the CPU or GPU too hard... Erm...

So, later in the generation when developers are more familiar with the hardware and capable of pushing it more, the PS5 won't be able to be pushed that hard because you shouldn't push the GPU or CPU too hard or you might have to downclock either the GPU or CPU.

So many people on both sides trying to say that X is the way it is done and there is absolutely no way that it could be doing Y when Cerny was pretty vague on a lot of things.

"So, it's easier to maintain higher frequencies than lower frequencies?"

Not at all what I am saying. I am saying that with their old way of selecting clock speeds they were struggling to get 2GHz/3GHz. Their new method allows them to mostly reach 2.23GHz/3.5GHz on typical workloads and will be lowered in outlier scenarios (likely also in other scenarios but that wasn't clear from what Cerny said). These were all things that Cerny said in the presentation.
Didn't feel vague to me, unless you can point out what elements I am adding myself that Cerny didn't say.

iroboto · Apr 1, 2020

mrcorbo said:
There kind of is, though, in the form of data spread across 10 or 6 chips.

Sure I guess that’s the best way to look at it.

Silent_Buddha · Apr 1, 2020

KeanuReeves said:
"So, it's easier to maintain higher frequencies than lower frequencies?"

Not at all what I am saying. I am saying that with their old way of selecting clock speeds they were struggling to get 2GHz/3GHz. Their new method allows them to mostly reach 2.23GHz/3.5GHz on typical workloads and will be lowered in outlier scenarios (likely also in other scenarios but that wasn't clear from what Cerny said). These were all things that Cerny said in the presentation.
Didn't feel vague to me, unless you can point out what elements I am adding myself that Cerny didn't say.

These are the things I recall.

It can maintain those clocks on GPU and CPU the majority of the time. So 51% or more of the time.
The greater majority of the time the GPU will maintain its frequency.
- No mention of the CPU maintaining its frequency during this time. Basically when the GPU is being pushed REALLY hard.

These are all reasonable expectations if you assume that developers won't attempt to utilize the CPU heavily.

We don't really have a precedent like this generation for consoles where both the CPU and GPU will basically be equivalent to modern day PC CPUs and GPUs.

PS3 had a very powerful CPU when it came to performing multiple tasks (no so much with general CPU code), however that had to make up for a pretty anemic GPU. So the CPU had to be pushed hard in order to match the competing console. And even then it was difficult to fully utilize the Cell Arch.

PS4 had a fairly powerful GPU and a pretty anemic CPU.

If you profiled against games for PS4 and assume that's what will happen in next gen. console games it may (or may not) lead you to the conclusion that you don't need as much power for the CPU.

Those outlier cases when profiling current generation games may or may not become more common place with the next generation of games. Consoles have been the limiting factor in game design for a very long time now. IE - this means that profiling of PC games doesn't help you to predict how developers will use hardware on next generation systems.

Just because AVX and its various iterations aren't used as much in games currently, doesn't mean they won't get used more frequently in the future when both consoles will have significantly more performant versions of AVX than Jaguar.

This applies not only to Sony, but Microsoft as well. Both have done extensive profiling of game behavior and developer practices to attempt to make a console that is as efficient as possible.

If developers end up not pushing the GPU and CPU hard simultaneously then Sony made a smart choice and MS didn't. If developers push the GPU and CPU harder simultaneously than Sony thinks they will, then MS will look like they made the smart choice while Sony may not.

IMO... If we really want to see any sort of revolution in gaming it's going to be from developers fully exploiting the additional CPU power that will be available to consoles in addition to the greatly increased access speed and transfer speeds of using SSD as main storage.

Regards,
SB

KeanuReeves · Apr 1, 2020

Silent_Buddha said:
These are the things I recall.

It can maintain those clocks on GPU and CPU the majority of the time. So 51% or more of the time.

The greater majority of the time the GPU will maintain its frequency.

No mention of the CPU maintaining its frequency during this time. Basically when the GPU is being pushed REALLY hard.

These are all reasonable expectations if you assume that developers won't attempt to utilize the CPU heavily.

We don't really have a precedent like this generation for consoles where both the CPU and GPU will basically be equivalent to modern day PC CPUs and GPUs.

[...]

If developers end up not pushing the GPU and CPU hard simultaneously then Sony made a smart choice and MS didn't. If developers push the GPU and CPU harder simultaneously than Sony thinks they will, then MS will look like they made the smart choice while Sony may not.

Regards,
SB

Mark Cerny is the one who said "It will spend most of its time at or close to that frequency and performance" so I wouldn't just call them expectations, these are quotes from the man who knows the most about the system.
Maybe he is lying. I am not a mind reader

What we don't have is the claim that in order for either the CPU or the GPU to be at max the other has to be lowered. And in a console context, I don't see how Sonys approach is not the smart approach. I think Xbox could have some pretty significant gains as well with this method.
Reminder that AMD SmartShift was only mentioned to be put in use in scenarios where the CPU was giving power to the GPU when the CPU wasn't utilizing its full power budget.

MrFox · Apr 1, 2020

Here again is the relevant section of the presentation, so that "recalling" doesn't need to be used, and "missing context" cannot cause confusion. Also as a side effect, the magical power of "quoting" is within your grasp:

MrFox said:
Mark Cerny said:

Another issue involving the GPU involves size and frequency. How big do we make the GPU, and what frequency do we run it at. This is a balancing act. The chip has a cost, and there is a cost for whatever we use to supply that chip with power and to cool it. In general I like running the GPU at a higher frequency. Let me show you why.

*shows an example of a hypothetical ps4 pro level with 36 or 48 cus at the same TF, faster chip have all sections of the chip faster, etc...*

It's easier to fully use 36CU in parallel than 48CU, when triangles are small it's much harder to fill all those CUs with useful work. So there's a lot to be said for faster assuming you can handle the resulting power and heat issues, which frankly we haven't always done the best job at. Part of the reason for that is, historically, our process for setting cpu and gpu frequencies have relied on heavy duty guesswork with regards to how much electrical power it will consume and how much heat will be produced as a result inside the console.

Power consumption varies A LOT from games to games. When I play GoW on ps4pro I kmow the power consumption is high just be the fan noise but power isn't simply about engine quality, it's about the minutiae of what's being displayed and how. It's counter-intuitive but processing dense geometry typically consumes less power than processing simple geometry which is, I suspect, why Horizon's map screen makes my ps4pro heat up so much.

Our process on previous consoles has been to try to guess the maximum power during the entire console lifetime might be. Which is to say, the worst case scene in the worst case game, and prepare a cooling solution which we think will be quiet at that power level. If we get it right, fan noise is minimal. If we get it wrong, the console will be quite loud for the highest power games and a change it might overheat or shutdown if we misestimated power too badly.

PS5 is especially challenging because the CPU supports 256bit native instructions that consume a lot of power. These are great here and there but presumably only minimally used... Or are they? If we plan for major 256bit instruction usage, we need to set the cpu clock substantially lower or noticeably increase the size of the power supply and fan. So after long discussions we decided to go with a very different direction for PS5.

*blah blah about gcn vs rdna cu sizes*

We went with a variable clock strategy for PS5, which is to say we continuously run the gpu and cpu in boost mode. We supply a generous supply of electrical power, and increase the frequency until it reaches the capability of the power and cooling solution. It's a completely different paradigm. Rather than running at constant frequency and letting power vary based on workload, we run essentially at constant power and let the frequency vary based on workload.

We then tackle the engineering challenge of a cost effective and high performance of a cooling solution designed for that specific power level. It's a simpler problem because there are no more unknowns. No need to guess what power consumption the worst case game might have. As for the details of our cooling solution, we're saving them for the teardown. I think you'll be quite happy with what the engineering team came up with.

So how fast can we run the GPU and CPU with this strategy?

The simplest approach would be to look at the temperature of the silicon die and throttle frequencies on thst basis. But that won't work, it fails to create a consistent PS5 experience. It wouldn't do to run a console slower simply because it was in a hot room.

So rather than look at the temperature we look at the activities the CPU and GPU are performing, and set the frequencies on that basis which makes everything determimistic and repeateable. While we're at it we also used AMD smartshift tech and send any unused power from the cpu to the gpu so we can squeeze out a few more pixels.

The benefits of this strategy are quite large. Running a gpu at 2ghz was looking like an unreachable target with the old fixed frequency strategy. With this new paradigm we're able to run way over that, in fact we have to cap the gpu at 2.23 so we can guarantee that the on chip logic operates properly. 36CU at 2.23 is 10.3TF and we expect the gpu to spend most of it's time at or close to that frequency and performance. Similarly, running the CPU at 3ghz was causing headaches with the old strategy. But now we can run it as high as 3.5ghz. In fact, it spends most of it's time at that frequency.

That doesn't mean ALL games will be running at 2.23 and 3.5. When that worst case game arrives, it will run at a lower clock speed, but not too much lower. To reduce power by 10% it only takes a couple percent in frequency. So I would expect any downclock to be pretty minor.

All things considered, the change to a variable frequency approach will show significant gains for playstation gamers.

see colon · Apr 1, 2020

All this memory bandwidth talk ignores the fact that the biggest sapper of

mrcorbo said:
There kind of is, though, in the form of data spread across 10 or 6 chips.

This always applies, though. PS5 would have the same bandwidth if it were only reading from 6 chips. Just like they would have the same bandwidth if they were reading from 8. Series X simply has a solution that involves 10 chips for more total bandwidth for the things that need them.

It's important to remember that, in general, GPUs are more bandwidth bound while CPUs are more latency bound. This might even be more true when looking at Ryzen CPUs. I would be amazed if Microsoft somehow didn't prioritize the most bandwidth heavy tasks to use all 10 memory chips, which is usually a smaller amount of memory written and read from the most frequently. That's where the real bandwidth bottleneck is.

RobertR1 · Apr 1, 2020

One thing that will be interesting to me personally is the board design for both. As the performance goes up VRM design choices will be more critical.

if we take the CPU’s for example. When an AVX/AVX2 instruction hits you get a large rush in current accompanied by a large drop in voltage (vdroop). This change in voltage is called a transient. If we take a 8700k/9900k for example, the difference between a really good and a great board is 30mv of additional voltage needed by the “really good” board to run the same workload without crashing.

the different between a low end board to really great is something like 150mv.

why does this transient response matter so much? Because voltage x current = power draw. Power draw = heat.

The higher the voltage you need to avoid a crash when you’re about to get hit with a large vdroop inducing load, the higher your power draw and heat output will be vs someone who designed their power delivery better.

The same is true of GPUs of course. Nothing changes there.

mrcorbo · Apr 1, 2020

see colon said:
This always applies, though. PS5 would have the same bandwidth if it were only reading from 6 chips. Just like they would have the same bandwidth if they were reading from 8. Series X simply has a solution that involves 10 chips for more total bandwidth for the things that need them.

Not sure what your point is or how the PS5 is at all relevant to whether or not XBSX has a fast and slow pool of memory.

see colon · Apr 1, 2020

mrcorbo said:
Not sure what your point is or how the PS5 is at all relevant to whether or not XBSX has a fast and slow pool of memory.

There aren't 2 pools on Xbox Series X. There is one pool, but if you limit it to a portion of it's memory chips, then you effectively limit it's bus width and available bandwidth. The exact same thing would happen on PS5, or PS4, or Xbox One. Even a graphics card like the Radeon VII, which boasts 1TB/s bandwidth. These max bandwidth numbers are achieved by reading from all memory chips simultaneously, but if you are going to limit any of these systems to reading/writing to only select chips those bandwidth numbers will drop. PS5 is relevant in that it, and every other system or subsystem is limited in the same way. The concept the Series X is somehow individually limited if you only read from some of it's available buses ignored how these other systems work, and how graphics workloads use and are limited by bandwidth. Graphics processing needs to be able to read and write a relativity small amount of data very fast multiple times a second, often reading data that has just been written. This is the work that saturates the buses. As long as there is enough memory available at the top speed, it will be fine.

People are acting like you need to force every byte of RAM through the APU to get pretty pictures on the screen. PS5 only has 448GB/s bandwidth and 16GB of RAM. If you had to read everything stored in memory every frame at max bandwidth you would only be able to do that 28 times a second. That's not how this works.

Silent_Buddha · Apr 1, 2020

MrFox said:
Here again is the relevant section of the presentation, so that "recalling" doesn't need to be used, and "missing context" cannot cause confusion. Also as a side effect, the magical power of "quoting" is within your grasp:

Thanks, it's good to know that I basically recalled it correctly.

Also, bookmarking it in case I ever need to look at it again.

Regards,
SB

Metal_Spirit · Apr 1, 2020

Silent_Buddha said:
So you are saying that...

Firstly, Sony were having trouble keeping both the GPU and CPU at 2 GHz and 3 GHz respectively.

Now, Sony have no trouble keeping the GPU at 2.2+ GHz and CPU at 3.5 GHz...at the same time.

???

So, it's easier to maintain higher frequencies than lower frequencies?

Have you ever read my explanation on this?

https://forum.beyond3d.com/posts/2115380/

jlippo · Apr 1, 2020

iroboto said:
Yea I get that. I guess I'm thinking about whether it's possible to have high enough texture resolution that the texture is never stretched and is 1x1 with your native resolution at closest camera for all objects. Yes you're only loading in portions of it with virtual texturing. And yes I'm sure we can find a way to load in and out things effectively but can we do that and you still have the benefits of the fast loading with the benefits of the 'instant turn around load everything' with the no pop-in. And the super speed etc.

It just seems like, if the only limitation to texture size is the footprint it leaves on the hard drive, I'm surprised that this wasn't resolved a long time ago.

I get where textures are a big part of graphics, the more detail they contain the better everything looks. It's just the way it is.

For such cases the solution is most likely a different workflow instead of just using huge texture per object.
More like material/detail/decal textures/shaders combined to create result.

I'm thinking the way they did clothes in Uncharted 4.

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

KeanuReeves

Metal_Spirit

Jay

mrcorbo

Foo Fighter

Proelite

BRiT

(>• •)>⌐■-■ (⌐■-■)

KeanuReeves

Silent_Buddha

KeanuReeves

iroboto

Daft Funk

Silent_Buddha

KeanuReeves

MrFox

Deludedly Fantastic

see colon

All Ham & No Potatos

RobertR1

Pro

mrcorbo

Foo Fighter

see colon

All Ham & No Potatos

Silent_Buddha

Metal_Spirit

jlippo

Similar threads