Playstation 5 [PS5] [Release November 12 2020]

I think the answer to his question depends highly on memory/latency of data to feed the CUs.

Looking at the questions with very rose-tinted PS5 glasses and knowing he was principal lead on PS5, I would venture to guess the following answers.

1. It can, but only when the resolution and frame rate is too low of being of use.
2. No, since he already asked MS on Twitter about intersections etc, when MS revealed XSX specs
3. I would say higher clocks (ref him being PS5 principal engineer), but the question is open ended, is he thinking of very specific situations or can one pick any situation? I would guess each has its own situation where it is the better approach.

Anyway, I doubt he really means what I said above, it seems a bit petty.

I also saw a quote on here somewhere, I am unable to find again, about having 40% usage of VALU is very good utilisation. If that is the case, what are normal/average % numbers?
 
Saying something is like something doesn't make it the same. Similes are used to help understanding. Honestly, it was described transparently in the talk...

The Tempest Engine (is) based on AMD's GPU technology. We modified a compute unit in such a way as to make it very close to the SPU's in PlayStation 3. Remember when I said that they were ideal for audio, so the Tempest Engine has no caches just like an SPU, all data access is via DMA just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve and then it would be more efficient than our GPU thanks to the SPU like architecture the goal being to make possible near 100% utilization​

It's a CU that operates in a fashion like a SPU in Cell, in not using caches but being constantly fed data so it can churn through it. That analogy is made so devs can understand how a CU can be made to crunch audio. There is nothing else SPU-like in the design. Why would there be? SPU is just a serial vector processor with a DMA engine on a ring bus and a Power-based ISA. A CU is a vector processor, and Tempest is a CU with a DMA engine. It doesn't need the alternative ISA or bus or anything else that a SPU has.

I agree that it could be talked about like a Synergistic Processing Unit, if one sees it as a modified CU with DMA serial RAM access instead of relying on caches, and I agree that maybe some devs might think of it as such, but it's not in any way a Cell SPU in PS5. It's an RDNA CU with caches removed.
Cerny has an incredible team of hardware engineers ninjas. In the next playstations what they will come with!. The Kraken developers were wowed when in the interview to talk about the decoder hardware designt Sony and AMD engineers already had almost designed the decoder.
 
Last edited:
I also saw a quote on here somewhere, I am unable to find again, about having 40% usage of VALU is very good utilisation. If that is the case, what are normal/average % numbers?
I believe this was Cerny that said this. I suspect < 40% is the norm given his statement. i have no reason to doubt it. Though I suspect that is for this generation of engines. Not necessarily indicative of future ones.

Looking at the questions with very rose-tinted PS5 glasses and knowing he was principal lead on PS5, I would venture to guess the following answers.

1. It can, but only when the resolution and frame rate is too low of being of use.
2. No, since he already asked MS on Twitter about intersections etc, when MS revealed XSX specs
3. I would say higher clocks (ref him being PS5 principal engineer), but the question is open ended, is he thinking of very specific situations or can one pick any situation? I would guess each has its own situation where it is the better approach.
Yea sure, I mean, I was actually thinking back to GCN architecture and the whole wavefronts and mixed wavefronts bit. And how more wavefronts gives you less registers to work with etc. But if you don't have enough CUs, you'll stall waiting for memory to arrive. But if you have too many wavefronts, that aren't processing quickly enough you'll run out of available registers to use. I don't know how RDNA 2 handles this, so I'm a bit curious.

Might be easier to just quote sebbbi here from an earlier thread of his from when he used to post here: https://forum.beyond3d.com/posts/1990831/

"GCN instructions are 64 wide. Executed in 4 cycles using a 16 wide SIMD. Maximum IPC is 1/4 per lane. CU has four SIMDs. But these are independent (each execute different set of waves). If you want a CU to execute 64 instructions (= 128 flops) per clock, you need to have four waves running on the CU (one per SIMD). This is 10% of the SIMD occupancy (max 10 waves per SIMD to hide latency). Fortunately all common instructions have latency of 1, so single wave per SIMD is actually enough to fully utilize the SIMD... assuming of course that there's no memory operations (including groupshared memory). GCN doesn't need high occupancy to fill the pipelines, it needs high occupancy to hide memory latency."
So I guess when I wrote that, I may have been confusing that more CUs has anything to do with leveraging high occupancy to hide latency better. And in my mind, a higher clock speed with less CUs may fill up the CUs while you're waiting for memory to arrive leading to a bottleneck. A poor thought that needs more thinking and information to validate though. I hope someone can offer some insights on where I got wrong. I'm not even sure if RT would even use wavesfronts. But Nvidia specified that BVH tree uses up to 2 GB of VRAM. So there is definitely a lot of bandwidth that could be involved in traversing the structure for intersections.
 
If they have something on raytracing, this is probably in the software stack. When Mark Cerny talk about raytracing he said they use the AMD version in RDNA 2.
 
The former principal software Engineer on PS5 is implying a narrow and fast design could be an advantage in some cases for RT.


Absolutely, filling every wavefront with coherent shading work in raytracing pipelines that need it is going to be a nightmare. It's all too easy to imagine even NVIDIAs relatively narrow GPU waves only being half full on a few raytracing titles already. Being able to to churn through those low occupancy tasks faster will do well. But even then, it's just one part of the pipeline that the PS5 will, maybe, match the Xsx. Who cares about having a 20% faster clockspeed if the competition is over 40% wider than you.
 
Saying something is like something doesn't make it the same. Similes are used to help understanding. Honestly, it was described transparently in the talk...

The Tempest Engine (is) based on AMD's GPU technology. We modified a compute unit in such a way as to make it very close to the SPU's in PlayStation 3. Remember when I said that they were ideal for audio, so the Tempest Engine has no caches just like an SPU, all data access is via DMA just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve and then it would be more efficient than our GPU thanks to the SPU like architecture the goal being to make possible near 100% utilization​

It's a CU that operates in a fashion like a SPU in Cell, in not using caches but being constantly fed data so it can churn through it. That analogy is made so devs can understand how a CU can be made to crunch audio. There is nothing else SPU-like in the design. Why would there be? SPU is just a serial vector processor with a DMA engine on a ring bus and a Power-based ISA. A CU is a vector processor, and Tempest is a CU with a DMA engine. It doesn't need the alternative ISA or bus or anything else that a SPU has.

I agree that it could be talked about like a Synergistic Processing Unit, if one sees it as a modified CU with DMA serial RAM access instead of relying on caches, and I agree that maybe some devs might think of it as such, but it's not in any way a Cell SPU in PS5. It's an RDNA CU with caches removed.
I will also throw this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs! :mrgreen:
 
For a hypothetical PS3 BC sake, is there anything a Cell SPE or a PPE does at 3.2GHz that couldn't be emulated by a Zen2 core and its much more powerful 256bit FPU at 3.5GHz?

I could see perhaps some difficulty with the localstore depending on how its used by a game. Im not sure if a traditional cache could perfectly emulate the performance of the LS on the SPU's with minimal effort.
 
I will also throw this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs! :mrgreen:

This could be part of the modification. Interesting.

And they talk about multiples FMA units. Great and something needed for audio. This was a great presentation from someone from SCEA about doing audio on AMD HSA and maybe part of the PS4 audio postmortem.

https://fr.slideshare.net/mobile/DevCentralAMD/mm-4085-laurentbetbeder
 
Last edited:
Saying something is like something doesn't make it the same. Similes are used to help understanding. Honestly, it was described transparently in the talk...

The Tempest Engine (is) based on AMD's GPU technology. We modified a compute unit in such a way as to make it very close to the SPU's in PlayStation 3. Remember when I said that they were ideal for audio, so the Tempest Engine has no caches just like an SPU, all data access is via DMA just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve and then it would be more efficient than our GPU thanks to the SPU like architecture the goal being to make possible near 100% utilization​

It's a CU that operates in a fashion like a SPU in Cell, in not using caches but being constantly fed data so it can churn through it. That analogy is made so devs can understand how a CU can be made to crunch audio. There is nothing else SPU-like in the design. Why would there be? SPU is just a serial vector processor with a DMA engine on a ring bus and a Power-based ISA. A CU is a vector processor, and Tempest is a CU with a DMA engine. It doesn't need the alternative ISA or bus or anything else that a SPU has.

I agree that it could be talked about like a Synergistic Processing Unit, if one sees it as a modified CU with DMA serial RAM access instead of relying on caches, and I agree that maybe some devs might think of it as such, but it's not in any way a Cell SPU in PS5. It's an RDNA CU with caches removed.
How goes modifying a single CU impact yields? what I mean is, if 36 of 40 need to meet a certain tolerance for the chip to be good, how does a CU modification effect manufacturing process?
 
'SPU' here means 'Sound Processing Unit' and not 'Synergistic Processing Unit'. There's no Cell-like SPU involvement. It's an RDNA2 CU modified to make it a bit better at DSP workloads.
"[...] We're calling the hardware unit that we build, the Tempest Engine. It's based on AMDs GPU technology. We modified a compute unit in such a way as to make it very close to the SPUs in PlayStation 3. Remember when I said that they were ideal for audio? So, the Tempest Engine has no caches, just like an SPU. All data access is via DMA, just like an SPU [...]"
- Mark Cerny, The Road to PS5 @44:32

I think at least Mark Cerny was talking of PS3 SPUs.:?:
 
I think at least Mark Cerny was talking of PS3 SPUs.:?:
Yes there. But when someone else says the PS5 has a SPU, they may mean Sound Processing Unit because it hasn't got a SPU from PS3 in it. Some devs however do mean an SPU in a confusing way, because PS5 hasn't got a SPU in it. Saying," yay, SPU is back," is just gonna confuse a lot of people.
 
How goes modifying a single CU impact yields? what I mean is, if 36 of 40 need to meet a certain tolerance for the chip to be good, how does a CU modification effect manufacturing process?
It should be the same as any other functional block in that there's no redundancy. If one of the CU has a defect, the chip is still usable, but if the Tempest Engine has a defect, same as if one of the CPU cores has a defect, the chip is unusable.
 
Yes there. But when someone else says the PS5 has a SPU, they may mean Sound Processing Unit because it hasn't got a SPU from PS3 in it. Some devs however do mean an SPU in a confusing way, because PS5 hasn't got a SPU in it. Saying," yay, SPU is back," is just gonna confuse a lot of people.
Same acromyn.
What makes it more confusing for people is that the PS5 SPU is refered to as like the PS3's.
 
Back
Top