Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Power consumption =/= frequency. Cerny explained it during the conference, that's why both CPU and GPU can hit both max frequencies at the same time. I don't developers are going to think in terms of "frequency" but how much power does this particular effect cost me.

Just to add a bit to this, Cerny also specifically said that geometry-intensive frames consume relatively low power, so we could assume that games using very dense geometry (like e.g. the Unreal Engine 5 demo) should be the ones pushing closer to the 2.23GHz max GPU clocks.
It's at about ~33m48s in the Road to PS5 presentation.


Mark Cerny said:
It's counter-intuitive, but processing dense geometry typically consumes less power than processing simple geometry. Which is, I suspect, why processing Horizon's map screen with its low triangle count makes my PS4 Pro heat up so much.
 
Just to add a bit to this, Cerny also specifically said that geometry-intensive frames consume relatively low power, so we could assume that games using very dense geometry (like e.g. the Unreal Engine 5 demo) should be the ones pushing closer to the 2.23GHz max GPU clocks.
It's at about ~33m48s in the Road to PS5 presentation.
In my opinion, I think Cerny's quote is speficially about standard 3D queue for rendering. Compute shaders are very efficient at leveraging all the hardware resources to do work, saturation should be very good. In this case, UE5 should be very taxing on all GPUs.
 
In my opinion, I think Cerny's quote is speficially about standard 3D queue for rendering. Compute shaders are very efficient at leveraging all the hardware resources to do work, saturation should be very good. In this case, UE5 should be very taxing on all GPUs.
And ? He gave an example about the whole GPU power consumption (his Pro heating up so much) stating it was a practical test for this claim. Remember that he is talking about the number of polygons here, not quality of lighting / textures.
 
And ? He gave an example about the whole GPU power consumption (his Pro heating up so much) stating it was a practical test for this claim. Remember that he is talking about the number of polygons here, not quality of lighting / textures.

Will be interesting to see what the cost of adding ray tracing effects (be they hardware or software) to polygon dense scenes is. Maybe lots of tiny polygons might make it harder to efficiently bunch rays together and make the most of acceleration structures in cache? Async compute and post process effects like ML super sampling might also not reduce workload as polygon density increases. Might even get more expensive.

The map scene is a good example to highlight the point he's making, but the reality of next gen games is probably a lot more complex. Most developers probably aren't going to want to be stalling CUs (the big silicon consumer of GPUs) so they can get a slight bump in geometry processing. That wouldn't seem to be an ideal use for any of these systems.
 
The implication is that higher geometry processing would lead to lower GPU utilization due to stalling, keeping in mind that the Geometry stages are extremely early in the pipeline.

One would expect power consumption to increase while Async Compute operates, for instance.

It's not unlike how capping the framerate would lead to lower GPU power consumption if it typically completes the work in a much shorter time otherwise.
Will be interesting to see what the cost of adding ray tracing effects (be they hardware or software) to polygon dense scenes is. Maybe lots of tiny polygons might make it harder to efficiently bunch rays together and make the most of acceleration structures in cache? Async compute and post process effects like ML super sampling might also not reduce workload as polygon density increases. Might even get more expensive.

The map scene is a good example to highlight the point he's making, but the reality of next gen games is probably a lot more complex. Most developers probably aren't going to want to be stalling CUs (the big silicon consumer of GPUs) so they can get a slight bump in geometry processing. That wouldn't seem to be an ideal use for any of these systems.

IIRC, Ray tracing is practically a separate stage of the pipeline, so it's not necessarily increasing simultaneous/concurrent utilization of various GPU components.

edit:

That said, the way MS describes their RT, you'd think it was compute/ALU + TEX/RT units. *shrug*
 
Last edited:
And ? He gave an example about the whole GPU power consumption (his Pro heating up so much) stating it was a practical test for this claim. Remember that he is talking about the number of polygons here, not quality of lighting / textures.
Board power consumed is based on the number of transistors flipping bits, in this case, the typical standard 3d pipeline uses the CUs as their workspace, but they are required to follow along a specific path and are still limited to the restrictions of each shader type despite the CUs serving as unified shader hardware. Typically speaking, the most complex the graphics are, or even geometry, the more likely you'll hit both saturation and stalling issues on the standard 3D graphics queue. You don't run into these issues with Compute shaders as much, they are able to take full advantage of all the available CUs and schedule the work in to be done, combined with Async compute, you can get very thorough workout on your CUs. If you're getting a lot of action there on the CU side of things (which represent the largest part of the GPU in terms of die size) and you're not leveraging the fixed function hardware as much (the ROPs and primitive generation/discard which represnt a small amount of die space) then you're likely to use a lot more power.

So while he's right that more geometry would generally cause things to slow down on the standard pipeline, if you are doing something like what Unreal is doing, you should see the opposite effect. Which is that you're asking the CU's to do the heavy lifting of doing in software what the fixed function hardware pipeline does in hardware. You're going to see way more transistor activity as a result, not less.

And so I wasn't saying Cerny was wrong, at least with respect to how typical graphics rendering works, but if you're going the software rasterization route, it's going to use way more power.
 
The implication is that higher geometry processing would lead to lower GPU utilization due to stalling, keeping in mind that the Geometry stages are extremely early in the pipeline.

One would expect power consumption to increase while Async Compute operates, for instance.

It's not unlike how capping the framerate would lead to lower GPU power consumption if it typically completes the work in a much shorter time otherwise.

Yep, but if you're building your engine so it stalls much of silicon that's kind of like ... the opposite of ... most things. Opportunistic boosting is a no-lose thing, even if it's based on a conservative yield based prediction stored in microcode.

"Gonna waste a lot of the silcon so I can boost WOOOOO!". I mean, what is that?

IIRC, Ray tracing is practically a separate stage of the pipeline, so it's not necessarily increasing simultaneous/concurrent utilization of various GPU components.

edit:

That said, the way MS describes their RT, you'd think it was compute/ALU + TEX/RT units. *shrug*

Could be more effective as a separate stage of the pipeline I'll grant you that. But if you want the accuracy of the ray tracing to match the geometry used for the rasterisation you're still going be left with a buffer of some kind with bunch of different vectors (multiple if transparency is involved). More vectors the more complex it gets even from the first bounce.

I just can't see however you shake it that more geometry makes accurate hybrid rendering less costly.
 
The implication is that higher geometry processing would lead to lower GPU utilization due to stalling, keeping in mind that the Geometry stages are extremely early in the pipeline.
Due to stalling what, the CUs? So in low-geometry scenes you're stalling the geometry stages. Meaning you're not "stalling more", you're just "stalling somewhere else".
His implication is simply that you get better IQ by stalling the CUs more and the geometry stages less.

Most developers probably aren't going to want to be stalling CUs (the big silicon consumer of GPUs) so they can get a slight bump in geometry processing.
The CUs may not be the big silicon consumer of the PS5's SoC. I mean it should still constitute "the largest area in the die consisted of similar units", but it's at least not as big in proportion as it was with the PS4 Pro or maybe even the PS4. We're now looking at probably 40 CUs total which isn't that wide, there's a block with "a large EDRAM pool" dedicated to I/O and they changed from the paltry Jaguar to much larger Zen2 cores.
 
Due to stalling what, the CUs?
Yes.

So in low-geometry scenes you're stalling the geometry stages. Meaning you're not "stalling more", you're just "stalling somewhere else".
His implication is simply that you get better IQ by stalling the CUs more and the geometry stages less.
With low geometry, you're just completing the geometry stage faster and moving on to the CUs. Whether the CPU / GPU-front end feeds the geometry stages after that will be up to how the renderer is architected, I think. Compute Units make up the majority of the die, so it's natural to expect that power consumption would increase if it's not stalling on the geometry. With the clock vs power relation, you'd expect that even moreso.

Stalling the pipeline means lower concurrent utilization, so I'm not sure if that's the best for image quality when folks are harping on about async compute to do as much work around the GPU as a whole and thus reducing render frame time or packing more work per second.

What he's saying is that you can get lower power consumption with higher geometry, but that doesn't necessarily mean the performance is there if you're stalling on a couple stages of the pipeline (less time to do the shader work), which affects the whole frame time.
 
With low geometry, you're just completing the geometry stage faster and moving on to the CUs. Whether the CPU / GPU-front end feeds the geometry stages after that will be up to how the renderer is architected, I think. Compute Units make up the majority of the die, so it's natural to expect that power consumption would increase if it's not stalling on the geometry. With the clock vs power relation, you'd expect that even moreso.
But he mentioned very specifically Horizon's map that had very low geometry, pushing a lot of power.

VlMBnuL.jpg



From what I recall, it's really not a very shader-intensive scene either. And this is not a PC with V-Sync off, the output is locked at 30 FPS in this game IIRC.
It's a "simple" scene with high power consumption, and the only reason stated for that is the low geometry.
 
That sounds like a failure in how the engine is handling the scene.

Or maybe Cerny wasn't trying to fool people into anything and geometry-intensive scenes do indeed cost less power for a certain IQ-level.
 
Or maybe Cerny wasn't trying to fool people into anything and geometry-intensive scenes do indeed cost less power for a certain IQ-level.
Do all games with menus or maps ramp up the power consumption?

No one is saying that higher geometry loads wouldn't lead to a decrease. The idea is that it's stalling the rendering pipeline such that it isn't hitting the GPU hard simultaneously.
 
Mark Cerny said:
It's counter-intuitive, but processing dense geometry typically consumes less power than processing simple geometry. Which is, I suspect, why processing Horizon's map screen with its low triangle count makes my PS4 Pro heat up so much.

This is the quote.
 
This is the quote.
no one here is refuting that. And Al is agreeing with you as to why that could be the case. But using UE5 as an example is not apple to apples comparison here. UE5 does it's geometry generation and discard through compute shaders and not through the fixed function pipeline that _all_ other games do; Cerny's musings on power performance with respect to geometry density do not apply in that case.
 
The CUs may not be the big silicon consumer of the PS5's SoC. I mean it should still constitute "the largest area in the die consisted of similar units", but it's at least not as big in proportion as it was with the PS4 Pro or maybe even the PS4. We're now looking at probably 40 CUs total which isn't that wide, there's a block with "a large EDRAM pool" dedicated to I/O and they changed from the paltry Jaguar to much larger Zen2 cores.

This is true - relative to last gen the CUs appear to be taking up less space on the die. It's still a big old chunk of hardware to be under utilising though.

I do think there will be lots of tasks that could keep CUs busy even in geometry dense scenes, especially going forward. The aforementioned Nanite and Lumen for example would seem to be contenders.
 
We've been here before, transistors are not all equal. :nope: Please report to the Re-conditioning team for "re-education" :runaway:
It doesn't matter if transistors are optimized for their role or not, power usage is still based on transistor flipping bits. Just because you've got different transistors doesn't mean flipping bits will suddenly be free.

This is a different statement than looking at the general activity of a whole processor.
 
It doesn't matter if transistors are optimized for their role or not, power usage is still based on transistor flipping bits. Just because you've got different transistors doesn't mean flipping bits will suddenly be free.
No change is free, but when a change is 800% more power efficient than another, then you can't use transistor states changes as a metric.
 
Status
Not open for further replies.
Back
Top