Playstation 5 [PS5] [Release November 12 2020]

It should be the same as any other functional block in that there's no redundancy. If one of the CU has a defect, the chip is still usable, but if the Tempest Engine has a defect, same as if one of the CPU cores has a defect, the chip is unusable.

Dev talk about their point of view what make SPU so special for them is not the ISA, or the endianess or the ring bus. It is the full scraptchpad memory control and asynchrous DMA having this they find the same programming model on the modified CU than PS3 SPU and for the SPU fan like the two guys talking it is great.

Yes there. But when someone else says the PS5 has a SPU, they may mean Sound Processing Unit because it hasn't got a SPU from PS3 in it. Some devs however do mean an SPU in a confusing way, because PS5 hasn't got a SPU in it. Saying," yay, SPU is back," is just gonna confuse a lot of people.

This is the message I wanted to answer.
 
Last edited:
Very interesting but it seems to go with what @Lurkmass have told us in "Road to PS5" Mark Cerny talked about hardware acceleration for ray-polygon interestion test like in DXR and he added ray-aabb intersection test (in DXR?).
 
Dev talk about their point of view what make SPU so special for them is not the ISA, or the endianess or the ring bus. It is the full scraptchpad memory control and asynchrous DMA having this they find the same programming model on the modified CU than PS3 SPU and for the SPU fan like the two guys talking it is great.
I don't understand this reply. What has devs enthusiasm with a CU with programmer-managed DMA got do with yields of chips when the Tempest unity suffers a defect during manufacture? :???:
 
I don't understand this reply. What has devs enthusiasm with a CU with programmer-managed DMA got do with yields of chips when the Tempest unity suffers a defect during manufacture? :???:

Sorry I wanted to answer other message where you said why dev talk about SPU end up reply to the wrong message.
 
When do the rdna2 gpu’s launch, they get delayed due to corona? If they would release this summer we might see more about amd’s next gen audio solutions.
 
AMD already has an audio SDK for the GPU. I imagine it's going to be very similar to that.
The PS5's offering a wavefront on Tempest for other things besides audio. It's a modest amount of extra compute that seems to be less accessible due to the unit's separation and architectural divergence from existing resources.
Other than being desperate for ~100 GF (or half of that?), would there be a use case versus keeping to the x86 or GPU compute?
If latency is improved over True Audio, which may depend on a more custom method of controlling the CU, could there be use cases where it might do more than absorb a few throwaway operations?

What I expect was the Tempest CU to be controlled like TrueAudio Next, through AMD's compute front-end, and I acknowledged the fact that PS5 does not crave out a pie from the main pool of CU like TAN does.
The talk of having two wavefronts could pose a challenge to using the TrueAudio model, since the path from queues to command processors to wavefront launch wouldn't have a buffer of wavefront slots to pipeline the process of starting new tasks without utilization gaps. The fixed and limited resources compared to a minimal TrueAudio allocation makes it look like a trivialized case of the resource management TrueAudio incorporates.
Maybe a persistent kernel model would work with it better, even if Sony neglected to mention something that would be a pretty fundamental change over how GPU compute architectures work.

The 64 FLOPs per cycle and 2 wavefronts could mean a number of things about how the CU is structured. That would mean one or two SIMDs in the CU, which would be a notable paring down if this were GCN based. RDNA would be fine with this, although it would be missing half of the dual-CU WGP.
Potentially unrelated aside, there is an ISA variant labelled GFX1011 that for some reason doesn't have a bug related to LDS usage in WGP mode. Maybe an RDNA variant that dropped half the WGP would consider that "fixed".
https://github.com/llvm-mirror/llvm/commit/eaed96ae3e5c8a17350821ae39318c70200adaf0 (under "def FeatureISAVersion10_1_1 : FeatureSet").
Although one counterpoint is that this also doesn't support XNACK, which is present for APUs. Counter to that is the possibility it doesn't apply if you have a memory model like the SPE?

How memory is handled could still be a puzzle. The SPE still had load/store instructions, but would Tempest revamp its memory instructions to match a local store, or would it try to be more consistent with GCN/RDNA? Could a compromise be more traditional GPU complex vector memory ops, with their range restricted to a local scratchpad?


I will also throw this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs! :mrgreen:
There are elements where this is similar, or perhaps the overall idea of customizing a CU aligns with it.
The vector memory pipeline seems to be relatively unchanged, and even with the persistent threading model there seemed to be more threads available than the 2 wavefronts mentioned for Tempest.
Some of the microarchitectural changes like dual-issue and units capable of gathering across register lanes would be interesting customizations to a CU, although not mentioned so far.

The persistent wavefront model and direct message queues directly to the CU would be notable changes. Maybe there's something like that, given the talk of there only being two wavefronts. It could be that the PR is being non-specific about what is being exposed via an API over a more standard arrangement, though.

This could be part of the modification. Interesting.

And they talk about multiples FMA units. Great and something needed for audio. This was a great presentation from someone from SCEA about doing audio on AMD HSA and.' maybe part of the PS4 audio postmortem.

https://fr.slideshare.net/mobile/DevCentralAMD/mm-4085-laurentbetbeder
Some of the wish list elements may be embodied in the claims, although the single wavefront for audio with 64 operations per clock sounds like it may not be as flexible in terms of data flow and sound pipeline engineering as hoped for in that presentation. Full throughput would require batching effects and sources, exposing the audio designer to low-level architectural details and potentially ruling out combinations of effects or sources if they cannot be made to fit.
If that means some of the hoped-for capabilities are still not satisfied by Tempest, have there been changes made with the PS5 to avoid the latency accumulation problems that were one of the reasons why the audio pipeline couldn't freely combine the CPU and DSP? The method of accessing the the DSP went through an API that injected variable amounts of latency, and giving a CU a local-store form of memory model wouldn't be sufficient to change that.
 
It should be the same as any other functional block in that there's no redundancy. If one of the CU has a defect, the chip is still usable, but if the Tempest Engine has a defect, same as if one of the CPU cores has a defect, the chip is unusable.
That's my point, I'm wondering how that could impact yields...
 
That's my point, I'm wondering how that could impact yields...
Broadly speaking, probability of the defect being inside the Tempest part. So (Area of Tempest / Area of Chip) x defect rate, or thereabouts. Given the low defect rates we're seeing at 7nm TSMC, it's probably all of 1% increase in rejected chips, if that.
 
That's my point, I'm wondering how that could impact yields...
Hard to say without more information, but it's smaller than a regular CU (no cache) and would account for a very small % of the entire soc so in and of itself should account for a small % of the overall soc failures.
 
Hard to say without more information, but it's smaller than a regular CU (no cache) and would account for a very small % of the entire soc so in and of itself should account for a small % of the overall soc failures.
The vector caches themselves are pretty small. There's more logic and wiring than storage arrays, perhaps due to the amount of parallelism. A larger area consumer may be the texture units.
There's some area associated with the scalar and instruction caches. Maybe we'd need to find out what else was removed.

Some questions would be whether the CU dropped the other caches, and how much of their related functionality was removed as well.
Even if the instruction cache were gone, at least some of the front end would persist with the question of what it would fetch from.
The small number of wavefronts could reduce the number of per-wave buffers and control blocks
The memory pipelines themselves could be simplified given they won't need to handle as many misses and would serve fewer clients. If they share a common pool, contention may appear for Tempest versus a standard CU.

It is undetermined what the DMA block and local store the CU would need would add back in area, although SRAM is defect-tolerant.
The numbers given math throughput could point to reduced ALU area, but there could be more tweaks to how the units and register file are handled that could add area.
 
The PS5's offering a wavefront on Tempest for other things besides audio. It's a modest amount of extra compute that seems to be less accessible due to the unit's separation and architectural divergence from existing resources.
Other than being desperate for ~100 GF (or half of that?), would there be a use case versus keeping to the x86 or GPU compute?
If latency is improved over True Audio, which may depend on a more custom method of controlling the CU, could there be use cases where it might do more than absorb a few throwaway operations?

According to Cerny’s presentation, Tempest has equivalent computation power as all 8 Jag cores on PS4, so about 100 GFLOPs.
It borrows SPU’s features and is specialized for *3D audio*, which brings up the latency question. How bad is TrueAudio and True Audio Next's latency ?

From chris1515’s link, we see their presentation’s conclusion.
From Shifty’s post, we know they acquired Wwise.

So Sony might have deployed a multi-pronged solution for PS5 game audio. The stack can run on CPU, GPU, and Tempest based on needs.

The goal is to deliver seamless 3D audio embedded, streamed and rendered as part of the game world on time. It might be a futile exercise to try to force fit everything (all the requirements) into Tempest alone.

The expectation or hope is probably that using the entire/rich toolset, the developers would spend more time on the contents and higher level behavior than the FLOPs and ms latency.
 

Attachments

  • 90F9531C-DE9D-46BF-BD5D-071BC1A1D53E.jpeg
    90F9531C-DE9D-46BF-BD5D-071BC1A1D53E.jpeg
    177.2 KB · Views: 18
Last edited:
It borrows SPU’s features
It works like SPU, it doesn't mean it "borrows SPUs features" since surely computational units with DMA isn't "SPU feature".
All Cerny has confirmed so far is that it's based on AMD CU which has had it's caches stripped out and uses DMA instead (which makes it "works in similar way as SPU")

How bad is TrueAudio and True Audio Next's latency ?
Shouldn't be an issue.
 
It works like SPU, it doesn't mean it "borrows SPUs features" since surely computational units with DMA isn't "SPU feature".
All Cerny has confirmed so far is that it's based on AMD CU which has had it's caches stripped out and uses DMA instead (which makes it "works in similar way as SPU")

That and explicit local store access are SPU features. Without these 2 basic traits, I won't call it work like SPU.
Low latency should be a key KPI too since their presentation concluded that GPU is more suitable for mid/long latency work.
CPU has low latency but not powerful enough.

Shouldn't be an issue.

How long ?
 
Back
Top