AMD already has an audio SDK for the GPU. I imagine it's going to be very similar to that.
The PS5's offering a wavefront on Tempest for other things besides audio. It's a modest amount of extra compute that seems to be less accessible due to the unit's separation and architectural divergence from existing resources.
Other than being desperate for ~100 GF (or half of that?), would there be a use case versus keeping to the x86 or GPU compute?
If latency is improved over True Audio, which may depend on a more custom method of controlling the CU, could there be use cases where it might do more than absorb a few throwaway operations?
What I expect was the Tempest CU to be controlled like TrueAudio Next, through AMD's compute front-end, and I acknowledged the fact that PS5 does not crave out a pie from the main pool of CU like TAN does.
The talk of having two wavefronts could pose a challenge to using the TrueAudio model, since the path from queues to command processors to wavefront launch wouldn't have a buffer of wavefront slots to pipeline the process of starting new tasks without utilization gaps. The fixed and limited resources compared to a minimal TrueAudio allocation makes it look like a trivialized case of the resource management TrueAudio incorporates.
Maybe a persistent kernel model would work with it better, even if Sony neglected to mention something that would be a pretty fundamental change over how GPU compute architectures work.
The 64 FLOPs per cycle and 2 wavefronts could mean a number of things about how the CU is structured. That would mean one or two SIMDs in the CU, which would be a notable paring down if this were GCN based. RDNA would be fine with this, although it would be missing half of the dual-CU WGP.
Potentially unrelated aside, there is an ISA variant labelled GFX1011 that for some reason doesn't have a bug related to LDS usage in WGP mode. Maybe an RDNA variant that dropped half the WGP would consider that "fixed".
https://github.com/llvm-mirror/llvm/commit/eaed96ae3e5c8a17350821ae39318c70200adaf0 (under "def FeatureISAVersion10_1_1 : FeatureSet").
Although one counterpoint is that this also doesn't support XNACK, which is present for APUs. Counter to that is the possibility it doesn't apply if you have a memory model like the SPE?
How memory is handled could still be a puzzle. The SPE still had load/store instructions, but would Tempest revamp its memory instructions to match a local store, or would it try to be more consistent with GCN/RDNA? Could a compromise be more traditional GPU complex vector memory ops, with their range restricted to a local scratchpad?
I will also throw
this AMD patent into the mix, which describes one possible general approach of how their GPU architecture can be extended to address common real-time "persistent" computing needs. Audio coincidentally is one of them, and coincidentally the patent describes a system with.... bespoke CUs!
There are elements where this is similar, or perhaps the overall idea of customizing a CU aligns with it.
The vector memory pipeline seems to be relatively unchanged, and even with the persistent threading model there seemed to be more threads available than the 2 wavefronts mentioned for Tempest.
Some of the microarchitectural changes like dual-issue and units capable of gathering across register lanes would be interesting customizations to a CU, although not mentioned so far.
The persistent wavefront model and direct message queues directly to the CU would be notable changes. Maybe there's something like that, given the talk of there only being two wavefronts. It could be that the PR is being non-specific about what is being exposed via an API over a more standard arrangement, though.
This could be part of the modification. Interesting.
And they talk about multiples FMA units. Great and something needed for audio. This was a great presentation from someone from SCEA about doing audio on AMD HSA and.' maybe part of the PS4 audio postmortem.
https://fr.slideshare.net/mobile/DevCentralAMD/mm-4085-laurentbetbeder
Some of the wish list elements may be embodied in the claims, although the single wavefront for audio with 64 operations per clock sounds like it may not be as flexible in terms of data flow and sound pipeline engineering as hoped for in that presentation. Full throughput would require batching effects and sources, exposing the audio designer to low-level architectural details and potentially ruling out combinations of effects or sources if they cannot be made to fit.
If that means some of the hoped-for capabilities are still not satisfied by Tempest, have there been changes made with the PS5 to avoid the latency accumulation problems that were one of the reasons why the audio pipeline couldn't freely combine the CPU and DSP? The method of accessing the the DSP went through an API that injected variable amounts of latency, and giving a CU a local-store form of memory model wouldn't be sufficient to change that.