Playstation 5 [PS5] [Release November 12 2020]

Even more curious, we have a performance estimation of it of 100-200 GFlops (whether going by CPU equivalent of CU performance). Cell would have been tiny and effective and more powerful and allow PS3 BC, and even be useable for other things. I wonder if it was considered? Did they see the value but find it but find it too difficult/costly to adapt Cell for an AMD SOC? Or was it not even on table, and if not, why not?

f6c.jpg
 
Even more curious, we have a performance estimation of it of 100-200 GFlops (whether going by CPU equivalent of CU performance). Cell would have been tiny and effective and more powerful and allow PS3 BC, and even be useable for other things. I wonder if it was considered? Did they see the value but find it but find it too difficult/costly to adapt Cell for an AMD SOC? Or was it not even on table, and if not, why not?

You'll need to address those questions to 3dilettante. :)

The RPCS3 folks are able to emulate PS3 on a PC. So I had made a hand-waving assumption that it may be doable now without a Cell processor.
https://rpcs3.net

If Sony want to serve their library of PS3 games over the cloud (or offer PS5 ports/backward compatibility), perhaps they need to solve this problem eventually. But it's a business question.

Besides audio, I am not sure if custom SPUs is appropriate (e.g., OS-level near realtime ML on-device inference, camera recognition) say for the new PSEye or PSVR 2. The original Cell was clocked at 3.2GHz to 4GHz to deliver that level of performance. Hetereogeneous Computing is already here in AMD's processors. Seems like unnecessary work since PS5 is a dedicated gaming device. Sony can do all these work on the new CPU/GPU full time.


EDIT: I also don't see Sony reusing their PS5 APU on other devices. So they will cut all the unused or under used parts away.
 
Last edited:
Even more curious, we have a performance estimation of it of 100-200 GFlops (whether going by CPU equivalent of CU performance). Cell would have been tiny and effective and more powerful and allow PS3 BC, and even be useable for other things. I wonder if it was considered? Did they see the value but find it but find it too difficult/costly to adapt Cell for an AMD SOC? Or was it not even on table, and if not, why not?
Simple answer could be is that cell is an soc, they only need the streaming capabilities.
So why include the PPC etc, when modified CU's does the job, and integrates easily into the soc?
 
Cerny literally said it's modified AMD Compute Unit (caches stripped out at least), didn't specify GCN/RDNA version.

And it is heavily modified so much that it works like a SPU. This is not working at all like a CU This is what I said. ND Audio dev call it a SPU not a CU for example.

I know this is a modified CU but I am sure when dev code for it this probably have much more in common with SPU because of DMA and a serial programming model than a CUs with a cache and many wavefront.
 
Last edited:
it works like a SPU. This is not working at all like a CUs. This is what I said

Well, if it wasn't a (modified) CU then why would Cerny say so himself? (hes not lying?) I guess AMD gpu's will have some form of next generation trueaudio where they 'modify' a CU.
 
Well, if it wasn't a (modified) CU then why would Cerny say so himself? (hes not lying?) I guess AMD gpu's will have some form of next generation trueaudio where they 'modify' a CU.

But he said this is a modified CUs he said it in Road to PS5 video and he gave more details about the modification to Digitalfoundry. Read at least the part I quote, he explains the difference between a CU inside a GPU and the Tempest engine and it shows it is not working at all the same. And I said ND Audio dev call it a SPU not a modified CU... He probably know much better than any of us the difference between a CU and a SPU.

I just wanted explain it is heavily modified and the architecture is like a SPU and the programming model too.

Edit: And why do you want to modify a CU into the GPU, it means it will not be usable for graphic. PS5 is a console better to reserve some CUs on PC. Here it is a discreet part, it is like having a sound coprocessor inside PS5.
 
Last edited:
Besides audio, I am not sure if custom SPUs is appropriate (e.g., OS-level near realtime ML on-device inference, camera recognition) say for the new PSEye or PSVR 2. The original Cell was clocked at 3.2GHz to 4GHz to deliver that level of performance. Hetereogeneous Computing is already here in AMD's processors. Seems like unnecessary work since PS5 is a dedicated gaming device. Sony can do all these work on the new CPU/GPU full time.
I think that's the main point, sadly. Cell would introduce a second ISA of completely different architecture. It probably wouldn't be worth opening up to devs as it'd be a bit of a VU0, unused potential, with devs relying on compute. That would limit it to an audio DSP through Sony's PS5 SDK providing PS3 BC, but without the RSX, so there'd still be that half to emulate. I guess the idea of doing like PS2+PS1 is considered to eclectic for modern tastes.

If anything sounds the death knell for exciting proprietary hardware in a modern console, I think this is it. The reasons for using Cell are fairly robust beyond added complexity, and it finds itself rejected. All future hardware will be x64, AMD/nVidia GPU and ARM ancillary processors. It's only ever going to be mainstream silicon IPs now. The console IHVs will paper over the tedium by giving the functional units their own names but underneath will be nothing interesting. Heck, I bet next gen MS and Sony share exactly the same console but they agree to talk about different benefits. Sony will get the permission to use AMD' standard built-in audio Tempest II and big up their audio solution. MS will get to talk about the RT units and name them Unreality Desynthesisers (or maybe X-ray Ultra X units). An console wars will be waged over which is better...
 
All future hardware will be x64, AMD/nVidia GPU and ARM ancillary processors.
It’s off-topic but I disagree with this. I think we just haven’t hit a wall and that’s why it’s homogenizing. There will be a hard plateau of some sort x64 cannot live forever. There will be another change; it’s just how we expect to get more processing power. We will hit a limit on silicon. Quantum bits are not meant for standard computing. So that just leaves architectural changes.
 
Heck, I bet next gen MS and Sony share exactly the same console but they agree to talk about different benefits
Probably right.

But it's also possible that MS could go arm, as long as it can emulate for BC. It's x86 emu is pretty decent anyway (not there with x64 yet though.
Remember talking about 6 years or so from now.
Not saying I think it's likely, but it's not like their not invested in arm also.

When I said I would like some differences in arch people carried on like it would be the end for cross platform games, and you wouldn't mind seeing big architecture differences :runaway:
 
I think that's the main point, sadly. Cell would introduce a second ISA of completely different architecture. It probably wouldn't be worth opening up to devs as it'd be a bit of a VU0, unused potential, with devs relying on compute. That would limit it to an audio DSP through Sony's PS5 SDK providing PS3 BC, but without the RSX, so there'd still be that half to emulate. I guess the idea of doing like PS2+PS1 is considered to eclectic for modern tastes.

If anything sounds the death knell for exciting proprietary hardware in a modern console, I think this is it. The reasons for using Cell are fairly robust beyond added complexity, and it finds itself rejected. All future hardware will be x64, AMD/nVidia GPU and ARM ancillary processors. It's only ever going to be mainstream silicon IPs now. The console IHVs will paper over the tedium by giving the functional units their own names but underneath will be nothing interesting. Heck, I bet next gen MS and Sony share exactly the same console but they agree to talk about different benefits. Sony will get the permission to use AMD' standard built-in audio Tempest II and big up their audio solution. MS will get to talk about the RT units and name them Unreality Desynthesisers (or maybe X-ray Ultra X units). An console wars will be waged over which is better...

Sony and MS probably should tackle platform problems in a more holistic fashion anyway.

AMD continued Cell's early leap into heterogeneous computing, segregated security kernel, etc.
The platform owners are free to customize their AMD parts.

But bold ideas in the game industry may require coherent innovations across all layers to gain adoption: SDK, OS, system hardware, workflow, cloud, titles.
They need to invest their resources carefully.

For now, we'll have to see how Cerny's dynamic clocking and game world streaming turn out in PS5. The other pieces will be revealed in due time too. That controller's "Create" button can't be just a label change can it ? :)

What thread is this? Where am I? Is this thread for PS5 or revisiting PS3? :confused:

It's le "PS3 soul in PS5" sub-thread.
 
I wonder if it was considered? Did they see the value but find it but find it too difficult/costly to adapt Cell for an AMD SOC? Or was it not even on table, and if not, why not?
Cell's implementation in PS3 was to top-to-bottom design; the PPE, the SPUs, the ringbus, the RAM arrangement. It's a lot of complexity to bring over if you want PS3 code to run unmodified.
 
Edit: And why do you want to modify a CU into the GPU, it means it will not be usable for graphic. PS5 is a console better to reserve some CUs on PC. Here it is a discreet part, it is like having a sound coprocessor inside PS5.
I would expect it to behave like TrueAudio Next, except for having a bespoke Tempest CU dedicated to the cause, instead of having to crave out normal CUs.

CU is the core of data crunching indeed. But behind that spotlight, there is the control stack managing these CUs often being overlooked. AMD GPUs have a long-established compute front-end stack that uses a user-mode command queue model. It controls compute dispatches (work ending up in CUs), accelerated functions (e.g. SDMA), and also its own work scheduling (priority & processes).

So it is reasonable for AMD and Sony to slot in a modified CU (outside the main pool), while reusing/sharing the rest of the compute control stack. It is fun to theorycraft the return of exotic hardware. But exoticness comes with maintenance cost. Provided that AMD is the one doing the silicon design work, I can’t see why AMD would offer to integrate exotic pieces, when its own IP can satisfy the spec.
 
Last edited:
I would expect it to behave like TrueAudio Next, except for having a bespoke Tempest CU dedicated to the cause, instead of having to crave out normal CUs.

CU is the core of data crunching indeed. But behind that spotlight, there is the control stack managing these CUs often being overlooked. AMD GPUs have a long-established compute front-end stack that uses a user-mode command queue model. It controls compute dispatches (work ending up in CUs), accelerated functions (e.g. SDMA), and also its own work scheduling (priority & processes).

So it is reasonable for AMD and Sony to slot in a modified CU (outside the main pool), while reusing/sharing the rest of the compute control stack. It is fun to theorycraft the return of exotic hardware. But exoticness comes with maintenance cost. Provided that AMD is the one doing the silicon design work, I can’t see why AMD would offer to integrate exotic pieces, when its own IP can satisfy the spec.

They decided to customize an AMD part for their own needs at the end, this is some hybrid CU/SPU. This is a bit exotic but not too much and doing the audio lib many people from ICE Team or Sony ATG were very good with programming SPU it might have helped a lot. TrueAudio Next is doing some reservation of some CU inside a GPU, not the same at all. At the end saying there is no exotism with a CU without cache and an asynchronous DMA programming model is just not true at all whatever people told this is just false. I don't know how Mark Cerny can be more precise than that.

The Tempest engine itself is, as Cerny explained in his presentation, a revamped AMD compute unit, which runs at the GPU's frequency and delivers 64 flops per cycle. Peak performance from the engine is therefore in the region of 100 gigaflops, in the ballpark of the entire eight-core Jaguar CPU cluster used in PlayStation 4. While based on GPU architecture, utilisation is very, very different.

"GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two," explains Mark Cerny. "One wavefront is for the 3D audio and other system functionality, and one is for the game. Bandwidth-wise, the Tempest engine can use over 20GB/s, but we have to be a little careful because we don't want the audio to take a notch out of the graphics processing. If the audio processing uses too much bandwidth, that can have a deleterious effect if the graphics processing happens to want to saturate the system bandwidth at the same time."

Essentially, the GPU is based on the principle of parallelism - the idea of running many tasks (or waves) simultaneously. The Tempest engine is much more serial-like in nature, meaning that there's no need for attached memory caches. "When using the Tempest engine, we DMA in the data, we process it, and we DMA it back out again; this is exactly what happens on the SPUs on PlayStation 3," Cerny adds. "It's a very different model from what the GPU does; the GPU has caches, which are wonderful in some ways but also can result in stalling when it is waiting for the cache line to get filled. GPUs also have stalls for other reasons, there are many stages in a GPU pipeline and each stage needs to supply the next. As a result, with the GPU if you're getting 40 per cent VALU utilisation, you're doing pretty damn well. By contrast, with the Tempest engine and its asynchronous DMA model, the target is to achieve 100 percent VALU utilisation in key pieces of code."

ND audio developer doing a joke about rendering people wanted to steal the Tempest Engine because most of the time they did not want to give some SPU to audio people.

unknown.png



https://twitter.com/sflavalle
 
Last edited:
For years I always thought of the SPUs as a DSP-like architecture. Not sure why it's not called that.

DMA a block in, process it in a nice massive register array or local memory, DMA the result out and... OMG the next block is already in! No coffee breaks for dsps...
 
Back
Top