Digital Foundry Article Technical Discussion [2021]

iroboto · Feb 14, 2021

Nesh said:
Cerny said something along the lines of filling up and exploiting fully the CU rather than spread to many and have them underutilized.
It seems like the clock boost is supposedly going to help in that regard. Maybe he things the CU count with that clock boost will help push the performance better? Hence the gap will not be as big as we expect?
Also I wonder how CU count scales up, considering so many GPUs out there with different CU counts. Games wont be fully optimized with those with the highest count but with an average out there. Unless the games are intelligent enough to scale efficiently and properly

If you want to take full advantage of a GPU, you do a lot of embarrassingly parallel work. When we look at the compute shader queue, work is divided into threads per block. And each CU/SM can handle so many blocks. So more CUs = more blocks that can be issued at once. Typically on the compute side I work with nvidia, so they assign IIRC about 1000 threads per block per SM/CU. Each SM can handle a couple of blocks. Because of the way the threads are serialized and the shared memory between the CUDA cores, you can assign work that can share data in which you're obtaining extremely good utilization out of your ALU. Huge amounts really.

So in this case, having more CUs is a much greater advantage than having high clock speed, because ultimately more work can be done in parallel, and latency is ultimately handled by the amount of thread switching a CU can do. The CUs ultimately all need to wait for memory to provide the next piece of work, so tearing through your compute jobs faster doesn't necessarily improve performance. Having a large number of cores that can hold a lot of threads for work processing can keep its saturation up while it waits for the next bit of memory to arrive is ideal considering how latent memory can be. High parallelism will thrive on maximum throughput, provided you've got the bandwidth to feed it. The more work you can give it, the more work that can be done in parallel and keep stalling to a minimum. Ultimately the unit of work per time is going to be higher on multicore processing if they are being fully realized, not to mention being significantly more energy efficient at it.

tldr; the programmers don't need to account for scaling more CUs. The Bandwidth needs to scale with the number of CUs. Programmers need to ensure they are coding in a way that ensures those CUs are fed well. Be clever at synchronizing threads etc.

see colon · Feb 14, 2021

Globalisateur said:
But PS4 had 40% more compute and more than 100% more main ram bandwidth while XB1 had roughly no advantages on its own overall so the situation with PS4 was very different than now.

Once the Xbox One S was launched, it had like a ~15% clock advantage, though, right? This would have been on CPU and GPU. Considering many games run at 900p on Xbox one and 1080p on PS4, you would think rendering 40% more pixels with 40% more compute would produce a few edge cases where Xbox One S's CPU advantage or it's narrower and faster (clocked) GPU would pull ahead, assuming developers leveraged ESRAM to mitigate the bandwidth discrepancy.

snc · Feb 14, 2021

cwjs said:
No -- i predict it will converge on the tflop/bandwidth advantage over time. Which is like ~25%, right? The thesis is: If a game takes advantage of both cards' architecture equally, the difference in performance will align closely to the cards' specs. This is a pretty reasonable claim!

yeah it reasonable but cpu and i/o will still have impacts on fps and don't think it will change that much in future

PSman1700 · Feb 14, 2021

cwjs said:
Xbox just has the more powerful gpu)

On top of a much higher bandwith throughput, a somewhat faster clocked cpu and no conentions between cpu/gpu when things get hammered tight, which down the line, will happen.
Everything basically went wide (r), or both. Be it rdna2 gpus, NV, xbox etc except sony.
Guess that utilisation for wider gpus will be happening indeed, in special considering ray tracing (and reconstruction if we also start using cu's for that

iroboto · Feb 14, 2021

snc said:
yeah it reasonable but cpu and i/o will still have impacts on fps and don't think it will change that much in future

imo, I/O shouldn't have any impact on fps unless it's tied to rendering and no one should be tying a 5GB/s bandwidth to rendering. I should be careful with the choice of words here, because someone will undoubtedly showcase a stutter or frame rate drops due to I/O involvement. But that is likely signs of other issues plaguing their texture streaming system or a complete lack of available memory such that the pools are so small that relying on I/O to offload/unload is the only plausible scenario left.

As for CPU being the bottleneck for the GPU. This is also unlikely. The CPU may account for likely no more than 5% of the bottleneck in total render time over the course of a large benchmark. Most of the time it's significantly less, unless your goal is to maximize framerate into the high 100+ range.

You're unlikely to be CPU bottlenecked if you're also approaching an I/O bottleneck. Since that would be admission of a memory bottleneck. And thus the largest footprint in memory are GPU related items, so you're back at a GPU dependency.
To be CPU bottleneck, you need low resolution and high frame rates. Or such a hell of a complex world with hundreds upon thousands of little objects that are interactive and happening within the world at once. But once again, there is hardware for this; AVX, AVX2.

Otherwise, there is likely very little possibility that you'll get a CPU bottleneck at higher resolution with high fidelity unless you're back at jaguar cores and those were very biased setups.

snc · Feb 14, 2021

iroboto said:
imo, I/O shouldn't have any impact on fps unless it's tied to rendering and no one should be tying a 5GB/s bandwidth to rendering. I should be careful with the choice of words here, because someone will undoubtedly showcase a stutter or frame rate drops due to I/O involvement. But that is likely signs of other issues plaguing their texture streaming system or a complete lack of available memory such that the pools are so small that relying on I/O to offload/unload is the only plausible scenario left.

As for CPU being the bottleneck for the GPU. This is also unlikely. The CPU may account for likely no more than 5% of the bottleneck in total render time over the course of a large benchmark. Most of the time it's significantly less, unless your goal is to maximize framerate into the high 100+ range.

You're unlikely to be CPU bottlenecked if you're also approaching an I/O bottleneck. Since that would be admission of a memory bottleneck. And thus the largest footprint in memory are GPU related items, so you're back at a GPU dependency.
To be CPU bottleneck, you need low resolution and high frame rates. Or such a hell of a complex world with hundreds upon thousands of little objects that are interactive and happening within the world at once. But once again, there is hardware for this; AVX, AVX2.

Otherwise, there is likely very little possibility that you'll get a CPU bottleneck at higher resolution with high fidelity unless you're back at jaguar cores and those were very biased setups.

there is difference between cpu bottlenecked and cpu has inpacts on fps(especially on minimum frames which are often analyze)

i/o shouldn't have much impact on ps4/xone era games but we are talking about potential future differences

iroboto · Feb 14, 2021

snc said:
there is difference between cpu bottlenecked and cpu has inpacts on fps i/o shouldn't have much impact on ps4/xone era games but we are talking about potential future differences

To me they mean the same as something has to be the bottleneck to the frame rate; that some part of the pipeline must ultimately be responsible for the total frame time.
But sure, I think I know what you're trying to get at.

I think you're referring to dips/stuttering versus the actual bottleneck of the frame rate, identifying the potential for a raw 'burst' of CPU or IO that can impact the render time momentarily.

I expect most of that type of bursty/stuttering behaviour to happen at the beginning of this generation and less so at the end. Mainly because the games are cross gen so they are designed around a slower CPU. They rely on the hardware brute forcing the frame rates and I/O to obtain high frames. I think as last gen falls off, the optimization around the CPU side of things will change dramatically so we shouldn't get that bursty like behaviour.

AbsoluteBeginner · Feb 14, 2021

see colon said:
Once the Xbox One S was launched, it had like a ~15% clock advantage, though, right? This would have been on CPU and GPU. Considering many games run at 900p on Xbox one and 1080p on PS4, you would think rendering 40% more pixels with 40% more compute would produce a few edge cases where Xbox One S's CPU advantage or it's narrower and faster (clocked) GPU would pull ahead, assuming developers leveraged ESRAM to mitigate the bandwidth discrepancy.

Xbox had 32 ROPs, PS4 had 64 ROPs, so that advantage in clock is not the same as this gen.

scently · Feb 14, 2021

AbsoluteBeginner said:
Xbox had 32 ROPs, PS4 had 64 ROPs, so that advantage in clock is not the same as this gen.

Actually, X1S has 16 ROPs, PS4 has 32, X1X has 32, and the PS4Pro has 64 ROPs.

AbsoluteBeginner · Feb 14, 2021

scently said:
Actually, X1S has 16 ROPs, PS4 has 32, X1X has 32, and the PS4Pro has 64 ROPs.

Ah you are right, I mixed it up. X1X had half but considerably higher clocks so it made up.

iroboto · Feb 14, 2021

AbsoluteBeginner said:
Ah you are right, I mixed it up. X1X had half but considerably higher clocks so it made up.

nah it never made it up with clocks

it had a truck load more bandwidth. ROPS are very easily bandwidth limited.

scently · Feb 14, 2021

iroboto said:
nah it never made it up with clocks it had a truck load more bandwidth. ROPS are very easily bandwidth limited.

Which is why I have been bemused by the speculation that PS5 is outperforming XSX because of the speed of the frontend even though, as far as I know, ROPs performance is still bound by available bandwidth. Whatever the case is with the XSX I doubt it has anything to do with its pixel/clk.

AbsoluteBeginner · Feb 14, 2021

iroboto said:
nah it never made it up with clocks it had a truck load more bandwidth. ROPS are very easily bandwidth limited.

Not quite, if it had clock speed of Pro and 10% more CUs you would still feel that.

Pro was BW limited way before ROP limited anyway, but that is because they effectively doubled PS4 GPU and only bumped BW by 20%

BRiT · Feb 14, 2021

Doom Eternal Switch: The Making Of An 'Impossible' Port - id Software/Panic Button Interview

DigitalFoundry said:
It's the interview we've been waiting years to do! Just how is it possible to get games like Doom Eternal ported over to Nintendo Switch? What does it take to make an 'impossible port' like this a reality? John Linneman talks with Panic Button Lead Engineer Travis Archer and id Software's Lead Engine Programmer Billy Khan to talk all things Switch and id Tech!

ChuckeRearmed · Feb 14, 2021

Id Tech 7 seems to be very interesting engine. I wonder what it could be if it were as open as Unreal Engine. I presume Unreal Engine is better?

BRiT · Feb 14, 2021

Depends on what you're looking at/for. Support and flexibility may be better on UE because that's been the requirements from their business model for some time now.

cheapchips · Feb 14, 2021

Really fond of these DF talks to the Devs videos. Needs more shots of dev tools in action though. *

* I don't know why I need this!

Rootax · Feb 15, 2021

I'm really impressed by how much they're culling/discarding on switch... And I wonder how much compute power is required to do that.

Remij · Feb 15, 2021

Love the DF interview videos. You can just tell John and the others are happy when they get to do more content like this. Love seeing them connect more with folks in the industry.

snc · Feb 16, 2021

Digital Foundry Article Technical Discussion [2021]

iroboto

Daft Funk

see colon

All Ham & No Potatos

snc

PSman1700

iroboto

Daft Funk

snc

iroboto

Daft Funk

AbsoluteBeginner

scently

AbsoluteBeginner

iroboto

Daft Funk

scently

AbsoluteBeginner

BRiT

(>• •)>⌐■-■ (⌐■-■)

ChuckeRearmed

BRiT

(>• •)>⌐■-■ (⌐■-■)

cheapchips

Rootax

Remij

snc

Similar threads