Sony PlayStation 5 Pro

He meant the splitting of CUs between PSSR and rendering. Assuming PSSR is done on the shader core, there wouldn’t be a rationing of cores.

Isn't that exactly how it would function though? If the additional 24CU's, at present, are being used like DLSS tensor cores, what's to stop using e.g. 44 for rendering and 16 for lower quality PSSR?
 
Isn't that exactly how it would function though? If the additional 24CU's, at present, are being used like DLSS tensor cores, what's to stop using e.g. 44 for rendering and 16 for lower quality PSSR?
Usually, GPU designs don't have variance from CU to CU, they are all identical to each other.

What they are most probably doing is using the new WMMA INT8 RDNA4 feature that's distributed across all compute units.
 
Usually, GPU designs don't have variance from CU to CU, they are all identical to each other.

What they are most probably doing is using the new WMMA INT8 RDNA4 feature that's distributed across all compute units.

But if we were looking at all 60CU's being used for rendering in conjunction with them all being used for WMMA INT8, why have we generally just seen slightly improved versions of the base console's 60fps mode PSSR'd to 4K?

It suggests a ~36CU rendering budget is still in place and the remaining 24 are effectively used for PSSR.

I understand that all of the CU's will have the same capabilities, and I know it's not normal for GPU functionality to be segmented, but tensor cores run parallel with the rest of the GPU which is why I think we may be looking at a portion of the CU's being used exclusively for PSSR.

I don't think any segmentation will be set in stone, which is why I think there's scope for some games to have a 40CU rendering budget with a 20CU PSSR budget, or 50/10 etc.
 
But if we were looking at all 60CU's being used for rendering in conjunction with them all being used for WMMA INT8, why have we generally just seen slightly improved versions of the base console's 60fps mode PSSR'd to 4K?

It suggests a ~36CU rendering budget is still in place and the remaining 24 are effectively used for PSSR.

I understand that all of the CU's will have the same capabilities, and I know it's not normal for GPU functionality to be segmented, but tensor cores run parallel with the rest of the GPU which is why I think we may be looking at a portion of the CU's being used exclusively for PSSR.

I don't think any segmentation will be set in stone, which is why I think there's scope for some games to have a 40CU rendering budget with a 20CU PSSR budget, or 50/10 etc.
I would venture PSSR being run on the shader core is precisely why we see so little improvement outside of the quality of reconstruction. PSSR is probably quite heavy without dedicated cores like Nvidia has. The 60 CUs would be used for rendering as well as PSSR. I would think it's likely that the PSSR calculations would be run on all the cores at once given how potentially bandwidth heavy it is.

I doubt Sony will reveal the hardware specifics at all so we will probably have to wait for someone to dissect the hardware and give us a breakdown on whether or not there are dedicated cores.
 
But if we were looking at all 60CU's being used for rendering in conjunction with them all being used for WMMA INT8, why have we generally just seen slightly improved versions of the base console's 60fps mode PSSR'd to 4K?

It suggests a ~36CU rendering budget is still in place and the remaining 24 are effectively used for PSSR.

I understand that all of the CU's will have the same capabilities, and I know it's not normal for GPU functionality to be segmented, but tensor cores run parallel with the rest of the GPU which is why I think we may be looking at a portion of the CU's being used exclusively for PSSR.

I don't think any segmentation will be set in stone, which is why I think there's scope for some games to have a 40CU rendering budget with a 20CU PSSR budget, or 50/10 etc.
No. PSSR takes 2ms of 16.6ms time to resolve a 1080p image to 4K and it uses all CUs (and the new WMMA INT8 from RDNA4) for that.

On that subject if I am not mistaken Tensor Cores also share ressources with shaders on Nvidia cards, am I right?
 
Last edited:
No. PSSR takes 2ms of 16.6ms time to resolve a 1080p image to 4K and it uses all CUs (and the new WMMA INT8 from RDNA4) for that.

On that subject if I am not mistaken Tensor Cores also share ressources with shaders on Nvidia cards, I am right?
That figure is from a patent dating back several years right? Things may have changed, costs can also vary from game to game.

Yes, tensor cores utilize bandwidth and I would imagine registers as well.
 
That figure is from a patent dating back several years right?

You mean the 2ms per frame of PSSR? It comes from the leak MLID did:

CcHyavsEVmmmsf8nYhyY69-1200-80.jpg


No idea if the data is correct but Sony took his video down when he posted it. It also had a screenshot comparing Ratchet using PSSR vs FSR.
 
You mean the 2ms per frame of PSSR? It comes from the leak MLID did:

CcHyavsEVmmmsf8nYhyY69-1200-80.jpg


No idea if the data is correct but Sony took his video down when he posted it. It also had a screenshot comparing Ratchet using PSSR vs FSR.
Yea reading this leak again, I'm now nearly 99% positive it's using CUs for PSSR.
As stated by 300 TOPs, just below it, it indicates 67 16-bit FP. divided in half gets you to 33 ish 32FP. Divided in half for dual submission gets you to 16ish TF 32 single submission.

This behaviour seems to align here with the numbers, and getting to 300TOPs is really just 2x to 8bit and 2x for sparsity.

This also explains why we didn't see a massive jump in base resolution, as 2ms of time is still required by the GPU to perform PSSR, and 16TF + % of dual submission isn't enough to really drag it far away from PS5 for instance once that 2ms is taken away for upscaling. This seems to correlate well with the results so far.
 
@iroboto I watched a bit of the 1st video @snc posted, they added more RT effects to the F1 game, I’m curious then how much PSSR affects performance overall besides the supposed 2ms per frame. It seems devs still are able to add more graphics features to the game even if the internal resolution has to be lower to then be upscaled with PSSR including its “cost”.
 
@iroboto I watched a bit of the 1st video @snc posted, they added more RT effects to the F1 game, I’m curious then how much PSSR affects performance overall besides the supposed 2ms per frame. It seems devs still are able to add more graphics features to the game even if the internal resolution has to be lower to then be upscaled with PSSR including its “cost”.
It's likely a result as of a lack of available bandwidth and not necessarily compute.
NN silicon require large caches to go significantly faster, and that's typical to find with Tensor Core style silicon (a 4070 has 36MB of L2, PS5 has 4MB, and XSX has 5MB respectively, 5Pro likely has 4MB of L2 as they kept their 256bit bus). In this case, if it's not available as the cost of building that type of cache into CUs would be costly, then you'd be hitting memory a lot more, overall reducing the available bandwidth available for both CPU + GPU. So once you have higher frame rate, combined with more trips to memory via RT and PSSR, what may be happening is a reduction of resolution in order for their to be sufficient bandwidth to keep the frame rate up.

I don't think it's a compute issue here for 5Pro. There's plenty of compute available, bandwidth is another issue, it's only marginally more bandwidth than PS5 and more or less the same bandwidth as XSX.
 
So Cerny's "custom hardware for machine learning" is just some tweak to the CUs? I'd consider that false advertising.
I mean the ‘custom silicon’ they used for the audio processing engine Tempest are just tweaked RDNA2 CUs.

A lot of people seemed convinced that Sony was going to deviate entirely from their modern console design back to ‘bespoke silicon’ but the whole point of Cerny’s push was to abandon that after the PS3’s relative failure and focus on simpler designs like in the PS4 and 5.
 
I mean the ‘custom silicon’ they used for the audio processing engine Tempest are just tweaked RDNA2 CUs.

A lot of people seemed convinced that Sony was going to deviate entirely from their modern console design back to ‘bespoke silicon’ but the whole point of Cerny’s push was to abandon that after the PS3’s relative failure and focus on simpler designs like in the PS4 and 5.

True. But there is (or was ¯\_(ツ)_/¯ ) still scope for custom silicon that's still industry standard ala tensor cores.

It's likely a result as of a lack of available bandwidth and not necessarily compute.
NN silicon require large caches to go significantly faster, and that's typical to find with Tensor Core style silicon (a 4070 has 36MB of L2, PS5 has 4MB, and XSX has 5MB respectively, 5Pro likely has 4MB of L2 as they kept their 256bit bus). In this case, if it's not available as the cost of building that type of cache into CUs would be costly, then you'd be hitting memory a lot more, overall reducing the available bandwidth available for both CPU + GPU. So once you have higher frame rate, combined with more trips to memory via RT and PSSR, what may be happening is a reduction of resolution in order for their to be sufficient bandwidth to keep the frame rate up.

I don't think it's a compute issue here for 5Pro. There's plenty of compute available, bandwidth is another issue, it's only marginally more bandwidth than PS5 and more or less the same bandwidth as XSX.

In that case, it would've been lovely to get some Infinity Cache in the Pro for exactly this reason. Especially for 700 bloody quid.
 
So Cerny's "custom hardware for machine learning" is just some tweak to the CUs? I'd consider that false advertising.
What else can Tensor like cores do in a gaming console outside upscaling? If after 2ms they’re just sat there waiting on the next frame then you're probably better off putting the time and transistor resources needed to build and connect a new block into just altering the CUs. It’s not as sexy, but then it’s a device to play games and not a machine learning blade system.
 
So Cerny's "custom hardware for machine learning" is just some tweak to the CUs? I'd consider that false advertising.
I mean, it may not appear to be a big customization, but I think It's a pretty significant tweak.
If XSX had dual issue support combined with sparsity, it too would be close to 300TOPS and be able to run these ML upscaling models. These customizations alone set 5Pro apart from XSX. Let alone the RT silicon differences.

I think it's a big get. If MS could release today a ML model on XSX today I'm sure they would have by now.
The question is whether the compute is the bottleneck here for XSX, and I think it is; bandwidth is equal to 5Pro, so the only reason there isn't a model, is likely that it cannot compute it in 2ms or less.
 
Back
Top