Wow, didn't even know this happened!
https://www.gamesindustry.biz/articles/2019-01-08-sony-acquires-audiokinetic
Wow, didn't even know this happened!
I think the answer to his question depends highly on memory/latency of data to feed the CUs.
Cerny has an incredible team of hardware engineers ninjas. In the next playstations what they will come with!. The Kraken developers were wowed when in the interview to talk about the decoder hardware designt Sony and AMD engineers already had almost designed the decoder.Saying something is like something doesn't make it the same. Similes are used to help understanding. Honestly, it was described transparently in the talk...
The Tempest Engine (is) based on AMD's GPU technology. We modified a compute unit in such a way as to make it very close to the SPU's in PlayStation 3. Remember when I said that they were ideal for audio, so the Tempest Engine has no caches just like an SPU, all data access is via DMA just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve and then it would be more efficient than our GPU thanks to the SPU like architecture the goal being to make possible near 100% utilization
It's a CU that operates in a fashion like a SPU in Cell, in not using caches but being constantly fed data so it can churn through it. That analogy is made so devs can understand how a CU can be made to crunch audio. There is nothing else SPU-like in the design. Why would there be? SPU is just a serial vector processor with a DMA engine on a ring bus and a Power-based ISA. A CU is a vector processor, and Tempest is a CU with a DMA engine. It doesn't need the alternative ISA or bus or anything else that a SPU has.
I agree that it could be talked about like a Synergistic Processing Unit, if one sees it as a modified CU with DMA serial RAM access instead of relying on caches, and I agree that maybe some devs might think of it as such, but it's not in any way a Cell SPU in PS5. It's an RDNA CU with caches removed.
I believe this was Cerny that said this. I suspect < 40% is the norm given his statement. i have no reason to doubt it. Though I suspect that is for this generation of engines. Not necessarily indicative of future ones.I also saw a quote on here somewhere, I am unable to find again, about having 40% usage of VALU is very good utilisation. If that is the case, what are normal/average % numbers?
Yea sure, I mean, I was actually thinking back to GCN architecture and the whole wavefronts and mixed wavefronts bit. And how more wavefronts gives you less registers to work with etc. But if you don't have enough CUs, you'll stall waiting for memory to arrive. But if you have too many wavefronts, that aren't processing quickly enough you'll run out of available registers to use. I don't know how RDNA 2 handles this, so I'm a bit curious.Looking at the questions with very rose-tinted PS5 glasses and knowing he was principal lead on PS5, I would venture to guess the following answers.
1. It can, but only when the resolution and frame rate is too low of being of use.
2. No, since he already asked MS on Twitter about intersections etc, when MS revealed XSX specs
3. I would say higher clocks (ref him being PS5 principal engineer), but the question is open ended, is he thinking of very specific situations or can one pick any situation? I would guess each has its own situation where it is the better approach.
So I guess when I wrote that, I may have been confusing that more CUs has anything to do with leveraging high occupancy to hide latency better. And in my mind, a higher clock speed with less CUs may fill up the CUs while you're waiting for memory to arrive leading to a bottleneck. A poor thought that needs more thinking and information to validate though. I hope someone can offer some insights on where I got wrong. I'm not even sure if RT would even use wavesfronts. But Nvidia specified that BVH tree uses up to 2 GB of VRAM. So there is definitely a lot of bandwidth that could be involved in traversing the structure for intersections."GCN instructions are 64 wide. Executed in 4 cycles using a 16 wide SIMD. Maximum IPC is 1/4 per lane. CU has four SIMDs. But these are independent (each execute different set of waves). If you want a CU to execute 64 instructions (= 128 flops) per clock, you need to have four waves running on the CU (one per SIMD). This is 10% of the SIMD occupancy (max 10 waves per SIMD to hide latency). Fortunately all common instructions have latency of 1, so single wave per SIMD is actually enough to fully utilize the SIMD... assuming of course that there's no memory operations (including groupshared memory). GCN doesn't need high occupancy to fill the pipelines, it needs high occupancy to hide memory latency."
Supposedly acitivision engineer confirmed vrs in ps5?
The former principal software Engineer on PS5 is implying a narrow and fast design could be an advantage in some cases for RT.
I will also throw this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs!Saying something is like something doesn't make it the same. Similes are used to help understanding. Honestly, it was described transparently in the talk...
The Tempest Engine (is) based on AMD's GPU technology. We modified a compute unit in such a way as to make it very close to the SPU's in PlayStation 3. Remember when I said that they were ideal for audio, so the Tempest Engine has no caches just like an SPU, all data access is via DMA just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve and then it would be more efficient than our GPU thanks to the SPU like architecture the goal being to make possible near 100% utilization
It's a CU that operates in a fashion like a SPU in Cell, in not using caches but being constantly fed data so it can churn through it. That analogy is made so devs can understand how a CU can be made to crunch audio. There is nothing else SPU-like in the design. Why would there be? SPU is just a serial vector processor with a DMA engine on a ring bus and a Power-based ISA. A CU is a vector processor, and Tempest is a CU with a DMA engine. It doesn't need the alternative ISA or bus or anything else that a SPU has.
I agree that it could be talked about like a Synergistic Processing Unit, if one sees it as a modified CU with DMA serial RAM access instead of relying on caches, and I agree that maybe some devs might think of it as such, but it's not in any way a Cell SPU in PS5. It's an RDNA CU with caches removed.
For a hypothetical PS3 BC sake, is there anything a Cell SPE or a PPE does at 3.2GHz that couldn't be emulated by a Zen2 core and its much more powerful 256bit FPU at 3.5GHz?
I will also throw this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs!
How goes modifying a single CU impact yields? what I mean is, if 36 of 40 need to meet a certain tolerance for the chip to be good, how does a CU modification effect manufacturing process?Saying something is like something doesn't make it the same. Similes are used to help understanding. Honestly, it was described transparently in the talk...
The Tempest Engine (is) based on AMD's GPU technology. We modified a compute unit in such a way as to make it very close to the SPU's in PlayStation 3. Remember when I said that they were ideal for audio, so the Tempest Engine has no caches just like an SPU, all data access is via DMA just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve and then it would be more efficient than our GPU thanks to the SPU like architecture the goal being to make possible near 100% utilization
It's a CU that operates in a fashion like a SPU in Cell, in not using caches but being constantly fed data so it can churn through it. That analogy is made so devs can understand how a CU can be made to crunch audio. There is nothing else SPU-like in the design. Why would there be? SPU is just a serial vector processor with a DMA engine on a ring bus and a Power-based ISA. A CU is a vector processor, and Tempest is a CU with a DMA engine. It doesn't need the alternative ISA or bus or anything else that a SPU has.
I agree that it could be talked about like a Synergistic Processing Unit, if one sees it as a modified CU with DMA serial RAM access instead of relying on caches, and I agree that maybe some devs might think of it as such, but it's not in any way a Cell SPU in PS5. It's an RDNA CU with caches removed.
Who cares about being 20% faster if the competition is over 40% wider than you.
"[...] We're calling the hardware unit that we build, the Tempest Engine. It's based on AMDs GPU technology. We modified a compute unit in such a way as to make it very close to the SPUs in PlayStation 3. Remember when I said that they were ideal for audio? So, the Tempest Engine has no caches, just like an SPU. All data access is via DMA, just like an SPU [...]"'SPU' here means 'Sound Processing Unit' and not 'Synergistic Processing Unit'. There's no Cell-like SPU involvement. It's an RDNA2 CU modified to make it a bit better at DSP workloads.
I will also throw this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs!
Yes there. But when someone else says the PS5 has a SPU, they may mean Sound Processing Unit because it hasn't got a SPU from PS3 in it. Some devs however do mean an SPU in a confusing way, because PS5 hasn't got a SPU in it. Saying," yay, SPU is back," is just gonna confuse a lot of people.I think at least Mark Cerny was talking of PS3 SPUs.
It should be the same as any other functional block in that there's no redundancy. If one of the CU has a defect, the chip is still usable, but if the Tempest Engine has a defect, same as if one of the CPU cores has a defect, the chip is unusable.How goes modifying a single CU impact yields? what I mean is, if 36 of 40 need to meet a certain tolerance for the chip to be good, how does a CU modification effect manufacturing process?
Same acromyn.Yes there. But when someone else says the PS5 has a SPU, they may mean Sound Processing Unit because it hasn't got a SPU from PS3 in it. Some devs however do mean an SPU in a confusing way, because PS5 hasn't got a SPU in it. Saying," yay, SPU is back," is just gonna confuse a lot of people.