Sony PlayStation 5 Pro

Maybe Rich should have first Lot some incence and doused holy water over the PS5Pro think cardboard Box. Now that would be true respect befitting a sacred professional machine.

WRT Dual Issue - things change, maybe the OG Version of PS5Pro was going to be dual issue, but they got rid of it for die space?

It helps compute Performance a Bit in Games, but not exactly a whole bunch in a way they might have seen as necessary for their bottom line.

Yes please. Now if you could all redo that video dressed in gowns while Gregorian chants play, it would be appreciated.

I'll do my bit and let you send the old unit to me. Y'know, to make space. Or something.
 
If Sony claim 33.4TFLOPs now, they will have a lot of marketing problems when announce PS6.

Doubt it's for that, and a potentional PS6 should be much more than 33TF. It is all about AI and ray tracing in the future, marketing wise. It's already now.
If you got a 33TF product you're not going to market just half of that, it doesn't make much sense.

Again could it be that the GPU is based on the 4 year old RDNA2 CU's? It would be roughly comparable in raw power to a vanilla RX6800 which in turn is close to a 7700XT which is 35TF? That wouldn't be too bad, mid range performance untill RDNA4 and Nvidias next GPU's. Or is it confirmed that its RDNA3+?
 
Doubt it's for that, and a potentional PS6 should be much more than 33TF. It is all about AI and ray tracing in the future, marketing wise. It's already now.
If you got a 33TF product you're not going to market just half of that, it doesn't make much sense.
Marketing power fell foul after the PS360 era. One can count all sorts of Flops and OPs to generate bigger numbers. I wouldn't read too much into it, in contrast to not advertising features such as ML frame generation.
Again could it be that the GPU is based on the 4 year old RDNA2 CU's? It would be roughly comparable in raw power to a vanilla RX6800 which in turn is close to a 7700XT which is 35TF? That wouldn't be too bad, mid range performance untill RDNA4 and Nvidias next GPU's. Or is it confirmed that its RDNA3+?
Nothing's confirmed. The only official information I think we have is the leaked tech docs and Mark Cerny's engineering statement and now what's included in the PS5. We also have this new software engineer statement. These sources aren't consistent. Cerny said, "Custom hardware for Machine Learning and an AI Library." That would naturally lead to general purpose ML functionality in hardware with software to drive upscaling. Now we have Bezrati, Principle Engine Programmer at Insomniac, calling it "an actual custom silicon AI upscaler (and antialiasing)...that frees up a lot of the GPU" which can only really mean a functional block dedicated to the job. Then we have the leak that Globby pointed to with numbers that match the theory of ML residing in the Compute and TF counts that match RDNA3 theories but statements like 'full custom design'.

67 TF 16 bit precision in those leaks struggles to reconcile with Sony's new official 16.7 TF figure. Where does that 67 TF number come from? Not even dual-issue can account for that. And Cerny stated 67% bigger and 28% faster RAM speed. Well, 67% on PS5's 10 TF is 16.7 TF, which suggests it's just a bigger PS5 GPU, no other changes. Yet we know later AMD RT is in there.

Perhaps it's a heavily customised RDNA 2 GPU with RDNA 3/4 RT hardware and a discrete upscaler? But why would Sony want such a thing instead of full RDNA3/4, and what does it cost AMD to design and produce said chip versus using their existing building blocks?
 
Doubt it's for that, and a potentional PS6 should be much more than 33TF. It is all about AI and ray tracing in the future, marketing wise. It's already now.
If you got a 33TF product you're not going to market just half of that, it doesn't make much sense.

Again could it be that the GPU is based on the 4 year old RDNA2 CU's? It would be roughly comparable in raw power to a vanilla RX6800 which in turn is close to a 7700XT which is 35TF? That wouldn't be too bad, mid range performance untill RDNA4 and Nvidias next GPU's. Or is it confirmed that its RDNA3+?
From what we know of Series Consoles and PS5 as per earlier break down of the products, both are customized away from RDNA2 to be able to support backwards compatibility, there was a series of tweets going around looking at very nuanced differences between RDNA2, and GCN1.1 etc. Even the current generation consoles have some sort of connection to GCN1.1 in some aspects for these reasons. I believe around the ROPs mainly, there was also something about wave submissions as well, but for some reason my memory isn't recalling just right.

I mean there could be reasonable justifications to remove it if the following buckets are true:
a) it doesn't do as much for gaming and there are significant silicon costs (cost savings)
b) it causes backwards compatibility code to fail for some reason, perhaps dual issue FP32 requires certain things like the above that are directly challenges for BC (compatibility challenges)
c) PSSR is able to run just fine on 150TOPs, so save the silicon costs to keep the price down (cost savings)
d) They do have AI based tiles somewhere on the chip, and they would rather use those AI tiles to do PSSR than to run dual issue compute (cost increase with a cost decrease)

e) Marketing - hmmm. This one is interesting idea. They could have put 16.7TF-33.4TF. 16.7TF/33.4TF. There is a lot of ways they can write out the fact that there is dual issue available for developers to take advantage of. But an interesting omission is clockspeed, which technically would be the 2.17GHz versus PS5s 2.23Ghz.
Though they were very specific on things they wanted to promote and very ho hum on things they don't want to promote. So I happen to believe that Sony will only market things that they believe puts them in positive lights and not. And there's no reason to look at dual issue negatively _if_ developers can actually optimize for it. It was never mentioned in any earlier marketing materials. During the actual promotion of 5Pro they only mentioned having more CUs and more bandwidth.

I think of the above buckets, I feel like (e) is the least likely, though the probability is still present.
One pattern that I have always noticed when predicting what the hardware in consoles do, is that, when in doubt, cost savings rule supreme. They even cut portions out of the Zen2 processor for PS5 and that was unexpected. Xbox put in 1/2 the number of ROPS they should have and relied on double pumped ROPs. Xbox has a weird memory configuration that makes it harder to program for. They have redundant CUs for chip yield. They have fixed clock speeds for XSX. Xbox introduced a second console at launch for a lower price point.

There are all sorts of cost cutting measures put into place, that every time you want to actually get excited for what consoles bring to the field, you have to almost always steel your enthusiasm about the hardware because you know deep down that they are going to the find the cheapest way available to do it.

And that's why consoles have some of the absolutely best bang for buck in terms of hardware to performance in gaming.
 
I think the most likely scenario is no dual issue. It still seems unlikely there are dedicated cores for AI math which leave the GPU free to do other work. The existing upgrades certainly don't support that IMO.
 
I think the most likely scenario is no dual issue. It still seems unlikely there are dedicated cores for AI math which leave the GPU free to do other work. The existing upgrades certainly don't support that IMO.
I think I may have found it, though I could be wrong on my interpretation here.

EDIT: I've updated my thoughts


RDNA 3 Architecture Overview:
Resize


Note the AI Matrix Accelerator is inside the Vector GPR.
Note the following blocks under the Vector GPR:
Float / Int / Matrix SIMD 32
Float / Matrix SIMD 32
SIMD8
DPFP - Dual Precision Floating Point AKA FP64 unit

Lets break down the Vector GPR to get more details (from AMD)
Resize


Here we see the:
64 ALU units up top
We see the 2 SIMD32 instruction units
We See this Transcendantal Int8 WMMA instruction units
And now we have this Int4 WMMA instruction unit but missing is the 'AI Matrix Accelerator' from above.

So lets look at the next picture
Resize


First off - the Title of the slide is VPGR as a Matrix Accelerator.
So what we're seeing here is the bottom 2 blocks now being used in conjunction with all 64 MP ALU units to do Matrix acceleration calculations.
So now we know what AMD is labeling as an AI Matrix Accelerator Unit.

They may do a little more here with RDNA 4, but I think this is it here.

So for this to work they must have 64 ALU units here.
They would definitely have both, therefore it would support Dual Issue FP32, they just aren't marketing it.

The Modes would be:
  • So it would be 1xFP64 over the 2xSIMD32 units. (when asked to do a FP64 math or matrix math)
  • Or 2xFP32 over the 2xSIMD32 units. (dual instruction)
  • or 1xFP32 over the 2xSIMD32 units. (single instruction) with a potential? 1xint instruction over the other unit.

edit: Using this article from Tom's Hardware:

A cursory look at the above might not look that different from RDNA 2, but then notice that the first block for the scheduler and Vector GPRs (general purpose registers) says "Float / INT / Matrix SIMD32" followed by a second block that says "Float / Matrix SIMD32." That second block is new for RDNA 3, and it basically means double the floating point throughput.

You can choose to look at things in one of two ways: Either each CU now has 128 Stream Processors (SPs, or GPU shaders), and you get 12,288 total shader ALUs (Arithmetic Logic Units), or you can view it as 64 "full" SPs that just happen to have double the FP32 throughput compared to the previous generation RDNA 2 CUs.

This is sort of funny because some places are saying that Navi 31 has 6,144 shaders, and others are saying 12,288 shaders, so I specifically asked AMD's Mike Mantor — the Chief GPU Architect and the main guy behind the RDNA 3 design — whether it was 6,144 or 12,288. He pulled out a calculator, punched in some numbers, and said, "Yeah, it should be 12,288." And yet, in some ways, it's not.

AMD's own specifications say 6,144 SPs and 96 CUs for the 7900 XTX, and 84 CUs with 5,376 SPs for the 7900 XT, so AMD is taking the approach of using the lower number.
However, raw FP32 compute (and matrix compute) has doubled. Personally, it makes more sense to me to call it 128 SPs per CU rather than 64, and the overall design looks similar to Nvidia's Ampere and Ada Lovelace architectures. Those now have 128 FP32 CUDA cores per Streaming Multiprocessor (SM), but also 64 INT32 units. But whatever the case, AMD isn't using the larger numbers.

So there is precedent here for using the lower number for SPs. I think if Sony is following this labelling format, they are using the lower number.
 
Last edited:
I think I may have found it, though I could be wrong on my interpretation here.


RDNA 3 Architecture Overview:
Resize


Note the AI Matrix Accelerator is inside the Vector GPR.
Note the following blocks under the Vector GPR:
Float / Int / Matrix SIMD 32
Float / Matrix SIMD 32
SIMD8
DPFP - Dual Precision Floating Point AKA FP64 unit

Lets break down the Vector GPR to get more details (from AMD)
Resize


Here we see the:
FP64 Unit up top
We see the 2 SIMD32 Units
We See this Transcendantal Int8 WMMA
And now we have this Int4 WMMA but missing is the 'AI Matrix Accelerator' from above.

So lets look at the next picture
Resize


First off - the Title of the slide is VPGR as a Matrix Accelerator.
So what we're seeing here is the bottom 2 blocks now being used in conjunction with that FP64 to do Matrix acceleration calculations.
So now we know what AMD is labeling as an AI Matrix Accelerator Unit.

So if they can leverage FP64 unit to do this and instead of dual issue FP32 that is doing it. It's clear the math is the same.
1FP64 breaks down into 8x int8 * 2 for sparsity
Which brings us back to 16x TOPs.
16.7 * 16 = 267 TOPs

They may do a little more here with RDNA 4, but I think this is it here. The dual Issue is not required for them to achieve the 300 TOPs as the AI Matrix accelerator uses the FP64 ALUs and they can save money by removing the other FP32 unit. Though this isn't proof that there isn't dual issue FP32. Just that, it appears dual issue FP32 is not necessary for the WMMA to do their work.

I dunno, I also feel like I could be reading this block completely wrong. And that what is being described as FP64 is just another mode for the SIMD32 to operate in.
If this is the case, I can't see how they would get away without the 2nd SIMD32 unit. They would definitely have both, it would support Dual Issue FP32, they just aren't marketing it.

edit: Using this article from Tom's Hardware:



So there is precedent here for using the lower number for SPs. I think if Sony is following this labelling format, they are using the lower number.
Your deduction is very reasonable and makes a lot of sense, particularly in the confines of a console. I hope we get an exact answer at some point in the future. @Bondrewd has said dedicated, tensor style cores aren't coming anytime soon in any consumer facing tech. I don't know if he legitimately has insider info though.
 
it sure can, 16.7tf 32bit ->33.4tf 32bit dual-issue -> 66.8tf 16bit dual-issue
From what I remember with RDNA3 dual issue isn't usable with rapid packed math (which is their mechanism for fp32->2xfp16).

If you look at microbenches for RDNA2 vs. 3 for example they have comparable fp16 throughput once you account for FPUs and clocks.

 
  • Like
Reactions: snc
From what I remember with RDNA3 dual issue isn't usable with rapid packed math (which is their mechanism for fp32->2xfp16).

If you look at microbenches for RDNA2 vs. 3 for example they have comparable fp16 throughput once you account for FPUs and clocks.

Very cool benchmarks! From this site we have:
Now, lets talk about that 123TFLOP FP16 number that AMD claims. While this is technically correct, there are significant limitations on this number. Looking at the RDNA3 ISA documentation, there is only one VOPD instruction that can dual issue packed FP16 instructions along with another that can work with packed BF16 numbers.
https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd9983c1-5a24-4210-9f48-79d7db06bf9a_682x310.png

This means that the headline 123TF FP16 number will only be seen in very limited scenarios, mainly in AI and ML workloads although gaming has started to use FP16 more often.
 
Why enable the quality mode settings in the performance mode if it's going to perform like that?
They should keep settings of performance mode but boost internal res. In their article about keeping internal res they mention they didnt see benefit in boosting res but clearly is visible difference between pssr quality and perf mode, not sure whats with remedy devs eyes 😁
 
Back
Top