Digital Foundry Article Technical Discussion [2024]

Status
Not open for further replies.
1:50:13 Supporter Q4: Could developers run a game’s logic at high rates to improve responsiveness, while keeping frame-rate untouched?
Just to add some detail to the DF answer, the question is a perfect example of where folks get confused between "latency/responsiveness" and throughput ("frame rate" here).

The latency or "responsiveness" of a game is purely the end to end time between making an input to seeing (or hearing/experiencing in any way, but generally seeing in this context) the output. Decoupling the game logic/input/simulation from rendering is indeed very common, but it is not done primarily for latency reasons. Indeed having it too decoupled adds a bit of complexity to various late latched input sampling techniques (ex. NVIDIA reflex or equivalent). The primary purpose in running these simulations at higher rates is usually physics stability, especially in the case of things like racing games where high speeds mean you are at a great risk of undersampling important effects and interactions if you run the simulation too infrequently.

Knowing a game's (stable) frame rate gives a lower bound on latency (i.e. if it takes this long to draw, the end to end latency has to be at least that), but it gives no upper bound as the other contributors could be arbitrarily long. Thus while reducing the time to draw a frame will generally reduce the end to end latency as increase the frame rate, thinking of the two as the same will lead to confusion like in the the question.

tldr: this is yet another example where frame rates are confusing; thinking about frame times is correct here.
 
This finally puts to rest a lot of partisan nonsense about how PS5 eschewed wide shader arrays because Cerny smart, how PS5 eschewed VRS because Cerny smart, that PS5 didn't support int4 and int8 because Cerny smart, that PS5 has a special Geometry Engine that's super custom unique (designed by Cerny, he smart).

Cerny is very smart, but PS5 is the way it is because that was the best Sony could do at the time. Now that they have access to tier 2 VRS, full Mesh Shader equivalence, AI acceleration instructions etc they've got it all. They probably have Sampler Feedback too. And now that the best way to push compute further is to go wider and - if anything - a little slower on balance they are doing that too.

There's a reason that the PS5 Pro is looking similar to the Series X at a high level - it's because they're both derived from the same line of technology, and they both face the same pressures on die area and power, and they both have very smart people deciding what to do with what's available.

Bit of a bummer that PS5 Pro doesn't have any Infinity Cache, but understandable given that it eats up die area. Being 2 ~ 3x faster at RT in the absence of any IC is cool though, and makes me quietly optimistic for RDNA4 and any possible AMD based handheld console.

@Dictator do you know how many ROPs the PS5 Pro has? Is it still 64?
 
This finally puts to rest a lot of partisan nonsense about how PS5 eschewed wide shader arrays because Cerny smart, how PS5 eschewed VRS because Cerny smart, that PS5 didn't support int4 and int8 because Cerny smart, that PS5 has a special Geometry Engine that's super custom unique (designed by Cerny, he smart).

Cerny is very smart, but PS5 is the way it is because that was the best Sony could do at the time. Now that they have access to tier 2 VRS, full Mesh Shader equivalence, AI acceleration instructions etc they've got it all. They probably have Sampler Feedback too. And now that the best way to push compute further is to go wider and - if anything - a little slower on balance they are doing that too.

There's a reason that the PS5 Pro is looking similar to the Series X at a high level - it's because they're both derived from the same line of technology, and they both face the same pressures on die area and power, and they both have very smart people deciding what to do with what's available.

Bit of a bummer that PS5 Pro doesn't have any Infinity Cache, but understandable given that it eats up die area. Being 2 ~ 3x faster at RT in the absence of any IC is cool though, and makes me quietly optimistic for RDNA4 and any possible AMD based handheld console.

@Dictator do you know how many ROPs the PS5 Pro has? Is it still 64?
PS5 always looked to me like a console designed by bean counters, trying to minimise costs as much as possible. From the variable CPU / GPU clock speeds to the ridiculous 825GB of storage, everything screamed cost optimization.
 
PS5 always looked to me like a console designed by bean counters, trying to minimise costs as much as possible. From the variable CPU / GPU clock speeds to the ridiculous 825GB of storage, everything screamed cost optimization.
Both are like that. I think Sony was intending to push out PS5 a year before XSX which would have killed them but I think the lack of RT hardware and the compute differential made them delay a year to figure out how to compete. In hindsight I don’t think it would have mattered.
 
PS5 always looked to me like a console designed by bean counters, trying to minimise costs as much as possible. From the variable CPU / GPU clock speeds to the ridiculous 825GB of storage, everything screamed cost optimization.
5.5 GB/s SSD is very expensive in 2020, and Dual Sense should also cost a lot more than other controllers. It's reasonable to reduce cost of GPU and CPU.


It is interesting if game developers see the design choice of next-gen consoles what would they choose?

A console with better controller and faster SSD? Or console with faster GPU and CPU?
 
Can we necrobump the Bandwidth per CU discussions? Or Wide is bad and narrow is good?

DF reports 16 WGP per shader engine. Well beyond XSX’s 14. With marginally more bandwidth on a smaller bus 256 vs 320, 4MB L2 vs 5MB L2 on XSX.

Hopefully this puts an end to this silly metric.

Are we validated in saying that PS5 was not RDNA2. We can see now that they take full VRS and Mesh Shaders. I assume it’s full DX12U compliant and some now.
How is that a silly metric? If anything the 45% of performance bump in raster tells you that having too many WGP per shader engine harms performance. They probably didn't go for less WGP per SE while increasing frequency because of thermal and binning reasons.

Getting a third SE in there would increase complexity and cost, so they didn't do it even if it would be ideal for performance.
 
Last edited:
  • Like
Reactions: snc
I honestly truly love DF Directs. So nice to look forward to after a long day of work.
 
Last edited:
PS5 Pro is more like a 4070-equivalent with GPU at 2.35 GHz.

"4070" is a good enough analogy for it's probable performance. Enough for most every 30fps mode to run at 60 if it isn't CPU capped, which is probably what all the Sony first party titles are going to get updated as.

It'll also be interesting to see if there's other/upcoming games that get RT upgrades, it's relatively easy to crank up RT as a setting in an engine.

Also, I'd stick with "$499 for the discless SKU" as a price prediction. Let's see how much anyone cares about a more powerful console, that's what I want to know.
 
How is that a silly metric? If anything the 45% of performance bump in raster tells you that having too many WGP per shader engine harms performance. They probably didn't go for less WGP per SE while increasing frequency because of thermal and binning reasons.

Getting a third SE in there would increase complexity and cost, so they didn't do it even if it would be ideal for performance.
because increasing the ratio doesn’t result in more performance. As long as you aren’t bandwidth bound then you are compute bound. To do processing you need compute. When people started using this metric as to why PS5 was outperforming XSX they were inadvertently implying that XSX was bandwidth bound.

If that is the case 5pro is significantly more bandwidth bound
 
because increasing the ratio doesn’t result in more performance. As long as you aren’t bandwidth bound then you are compute bound. To do processing you need compute. When people started using this metric as to why PS5 was outperforming XSX they were inadvertently implying that XSX was bandwidth bound.

If that is the case 5pro is significantly more bandwidth bound

Do we know that the cache hierarchies in PS5/Pro & XSX have similar bandwidths?
 
because increasing the ratio doesn’t result in more performance. As long as you aren’t bandwidth bound then you are compute bound. To do processing you need compute. When people started using this metric as to why PS5 was outperforming XSX they were inadvertently implying that XSX was bandwidth bound.

If that is the case 5pro is significantly more bandwidth bound
Are you saying that it's a silly metric because Cerny is making the same "mistake" so now this approach has been validated? Unless there are some secret reasons as to why the series x doesn't perform better than the PS5 in so many games, that is the reason for the way it is, so it's not silly.
 
We can argue about narrow and fast or wide and slow all day and never come to an agreement as there will always be situations where one is superior to the other. Surely the only metric that counts is performance (or results on screen) for number of transistors used. Do we have accurate numbers for PS5 and Xbox series X? Google searching seems to be a random number generator.
 
This finally puts to rest a lot of partisan nonsense about how PS5 eschewed wide shader arrays because Cerny smart, how PS5 eschewed VRS because Cerny smart, that PS5 didn't support int4 and int8 because Cerny smart, that PS5 has a special Geometry Engine that's super custom unique (designed by Cerny, he smart).

Cerny is very smart, but PS5 is the way it is because that was the best Sony could do at the time. Now that they have access to tier 2 VRS, full Mesh Shader equivalence, AI acceleration instructions etc they've got it all. They probably have Sampler Feedback too. And now that the best way to push compute further is to go wider and - if anything - a little slower on balance they are doing that too.

There's a reason that the PS5 Pro is looking similar to the Series X at a high level - it's because they're both derived from the same line of technology, and they both face the same pressures on die area and power, and they both have very smart people deciding what to do with what's available.

Bit of a bummer that PS5 Pro doesn't have any Infinity Cache, but understandable given that it eats up die area. Being 2 ~ 3x faster at RT in the absence of any IC is cool though, and makes me quietly optimistic for RDNA4 and any possible AMD based handheld console.

@Dictator do you know how many ROPs the PS5 Pro has? Is it still 64?
I am actually not sure about that one, let me take a look and write in here if I find anything.
 
Can we necrobump the Bandwidth per CU discussions? Or Wide is bad and narrow is good?

DF reports 16 WGP per shader engine. Well beyond XSX’s 14. With marginally more bandwidth on a smaller bus 256 vs 320, 4MB L2 vs 5MB L2 on XSX.

Hopefully this puts an end to this silly metric.

Are we validated in saying that PS5 was not RDNA2. We can see now that they take full VRS and Mesh Shaders. I assume it’s full DX12U compliant and some now.
XSX problem was always and mainly about not having enough L1 cache that is AFAIK still only the same amount as PS5. MS never disclosed those specs, and there is a reason for that (not the best specs of XSX hardware). Supposedly PS5 pro will have twice (100%) more L1 cache than XSX for just 15% more CUs to feed. Don't worry Cerny has done his homework also for PS5 Pro.
 
XSX problem was always and mainly about not having enough L1 cache that is AFAIK still only the same amount as PS5. MS never disclosed those specs, and there is a reason for that (not the best specs of XSX hardware). Supposedly PS5 pro will have twice (100%) more L1 cache than XSX for just 15% more CUs to feed. Don't worry Cerny has done his homework also for PS5 Pro.

Interesting point. AMD went from 128KB L1 per 5 WGPs in RDNA 1&2 to 256KB per 4 WGPs in RDNA 3. PS5 is at 128KB per 5 WGPs going to 256KB per 8 WGPs in the PS5 Pro.

Why should we assume that MS allocated a relatively stingy 128KB per 7 WGPs? That would be pretty inconsistent with what everyone else has done.
 
Are you saying that it's a silly metric because Cerny is making the same "mistake" so now this approach has been validated? Unless there are some secret reasons as to why the series x doesn't perform better than the PS5 in so many games, that is the reason for the way it is, so it's not silly.
No it has nothing to do with Cerny.

Im saying it’s never been a valid reason to ever judge the performance of a GPU. People were just using it comparing between consoles to point out that in this one particular metric PS5 was superior.
But bigger GPUs with more bandwidth have always performed better and had worse bandwidth ratios. It’s never not been the case except in situations where the GPU is bandwidth starved. Until we know that bandwidth is the bottleneck, you can keep upping the compute profile.
 
Status
Not open for further replies.
Back
Top