Playstation 5 [PS5] [Release November 12 2020]

It works like SPU, it doesn't mean it "borrows SPUs features" since surely computational units with DMA isn't "SPU feature".
There's a really silly semantic discussion here in that 'SPU like/features' isn't defined anywhere at all. People are clearly interpreting that based on unsubstantiated interpretations.

I linked earlier the technical document for the Cell SPU from Hofstee, which lists everything down to the latency and execution time in cycles for specific instructions. SPU-like could be any and all of those features.

In short, there's no point discussing whether Tempest is/isn't like a Cell SPU. People should talk about specific features of the SPU, how they help with audio, and whether Tempest has those features or not. Anything more generic than that is just noise and contributes nowt to this thread.
 
Last edited:
It is good enough for audio work otherwise Steam Audio wouldn’t bother supporting it for VR.

Does it implement HRTF and whatever Cerny wants to build into his audio dream ? This is peculiar.

Why waste money and effort on "only" a custom CU (on top of buying Wwise).
 
Does it implement HRTF
It does.
...and whatever Cerny wants to build into his audio dream ?
We don't know quite what his dream is. We'll need to hear the audio difference

Why waste money and effort on "only" a custom CU (on top of buying Wwise).
Well, there's the rub. Some are wondering if Sony did spend money on a custom CU solution versus it being an RDNA2 feature. Perhaps AMD invested the development with Sony's influence as a partner so Sony got what they wanted for their console and AMD got an audio solution for their IP? We know XBSX has audio hardware - maybe it's Tempest by another name?
 
Does it implement HRTF and whatever Cerny wants to build into his audio dream ? This is peculiar.

Why waste money and effort on "only" a custom CU (on top of buying Wwise).

True audio next is an sdk that supports processing of the most expensive calculations for 3D audio like convolution reverb with time-varying impulse responses. Steam audio uses ambisonics and supports hrtf. Whether the down sample from the ambisonic channels to stereo and hrtf is done with TAN, I do not know.

If I had to guess they can get higher utilization out of their customized CU so the silicon cost is lower than reserving two CUs on the GPU.

Edit: From the Steam Audio info on TrueAudio Next:

Note that Steam® Audio does not use TrueAudio Next for applying HRTF-based 3D audio rendering. The computational cost of HRTF processing is significantly lower than that of convolution reverb: thousands of sources can be rendered with HRTF-based 3D audio using a single CPU core.

https://steamcommunity.com/games/596420/announcements/detail/1647624403070736393

https://gpuopen.com/beyond-spatial-...eflections-third-order-ambisonics-demo-video/

https://github.com/GPUOpen-LibrariesAndSDKs/TAN
 
Last edited:
So from the SteamAudio writeup on TrueAudio Next four reserved CUs can handle 80 convolution sources. I'm not sure which GPU they're using in this example, but it may be RX480. The next graphic about performance impact of gpu reservation on framerate specifies rx480. If that's true, four CUs is 648 GFLOPS.

Figure: Performance improvements with increasing numbers of reserved CUs, for 1s IRs, 1st order Ambisonics, 1024-sample frames, and a 48 kHz sampling rate. The plot shows the maximum number of sources that can be processed within a 6ms budget. Increasing the number of reserved CUs allows more sources to be processed within the same amount of time.
https://steamcommunity.com/games/596420/announcements/detail/1647624403070736393

The way I interpret this, is convolution source would be the sound source, but for 1st order ambisonics there are four convolution channels. So 80 x 4 = 320 convolutions. I'm sort of stumbling my way through learning this, so if someone else knows better, I will not take it personally if you correct me.

Edit: I'm still a little unsure about how reverb is typically calculated in games. They seem to have basic sound attenuation and reverb effects in Unreal Engine 4.24, but "convolution reverb" is not being added until 4.25.

https://docs.unrealengine.com/en-US/Engine/Audio/DistanceModelAttenuation/index.html
https://docs.unrealengine.com/en-US/Engine/Audio/Overview/index.html
 
Last edited:
Depends, example does the customised cu's take up less silicon?

Yes.

Re-watched "Road to PS5".

* Cerny cited massive HRTF calculations made them bite the bullet to build a custom CU ("Multiple FFTs needed for every sound source for every audio tick") [43:50]

* Said CU has GPU-parallelism with SPU-like architecture
+ GPU SIMD : "More power than CPU"
+ SPU-like architecture: "More efficient than GPU", "Near 100% utilization" [44:57]

* Main goal is 3D Audio (It's over-budgeted for hundreds of objects); extra/remaining power for GPU to do "convolution reverb, or other tasks that need computation heavy operation, or high bandwidth" [46:02]

Sounds like he just need a tight unit for all the 3D audio jobs.
I'd imagine loading some sort of kernel to Tempest first, and then use "popular" DMA double-buffering technique (like SPUs do) to keep utilization close to 100%.

The Wwise framework would provide the full end-to-end audio stack from authoring to rendering.


EDIT: There... Can your future, little PSEye camera tricks be loaded to Tempest ? :runaway:
 
Last edited:
@patsu I don't really understand the separation of convolution reverb from 3D audio. It is part of real 3D audio. It is the accurate simulation of how sound, especially indirect sound, interacts with the environment. What I'm getting from this is the audio processor is largely designed to handle a large volume of the relatively cheaper calculations like the HRTF transforms so they won't have to be done on the CPU, but we likely won't have as much physical simulation of environments as I was expecting. It seems like their focus is basically spatialization.
 
@patsu I don't really understand the separation of convolution reverb from 3D audio. It is part of real 3D audio. It is the accurate simulation of how sound, especially indirect sound, interacts with the environment. What I'm getting from this is the audio processor is largely designed to handle a large volume of the relatively cheaper calculations like the HRTF transforms so they won't have to be done on the CPU, but we likely won't have as much physical simulation of environments as I was expecting. It seems like their focus is basically spatialization.

My summary is just based on Cerny's presentation format. The unit is used to do math, especially 3D audio math without tying up CPU and GPU.
The presentation highlights HRTF, but also mention possible applications for convo reverb.

The earlier part of the presentation mentioned they are going to do 3D audio for all sound objects. If/when that happens, there will be continuous 3D audio data to process. Tempest is an efficient part to deal with that workload.

I supposed the thing to explore is whether this efficiency helps keep the CPU and GPU running at "full speed" in Cerny's clocking scheme.
 
Last edited:
My summary is just based on Cerny's presentation format. The unit is used to do math, especially 3D audio math without tying up CPU and GPU.
The presentation highlights HRTF, but also mention possible applications for convo reverb.

The earlier part of the presentation mentioned they are going to do 3D audio for all sound objects. If/when that happens, there will be continuous 3D audio data to process. Tempest is an efficient part to deal with that workload.

Yah, I re-watched that part of the presentation. You can infer a lot because he mentions power left over being used for convolution reverb, and his focus on positioning rather than simulation of the environment. If you go back to the Steam Audio description, they do HRTF on the cpu because it's the cheaper of the two calculations between HRTF and convolution. You don't have that luxury on a console. 4 RX480 CUs can handle about 80 convolution sources. A PS5 CU is less than 2 RX480 CUs in terms of computational power. You'd need well over a 100% efficiency gain to match the 4 RX480 CUs, which might very well be possible, but then you're still at 80 convolution sources and you've already spent a portion of the processing power on the HRTF functions. So I think large scale physical simulation of the materials and the spaciousness of the environment are probably not in the cards for this console gen.
 
I don't know if they are talking about the same algorithms. Cerny mentioned the same compute power allows PSVR to render five thousands sound sources. They want to use the same power to simulate fewer (hundreds), more complicated sources.

He also mentioned Ambisonics. I suspect we will get an audio article like in PS3 when the console is launched.
 
According to Cerny’s presentation, Tempest has equivalent computation power as all 8 Jag cores on PS4, so about 100 GFLOPs.
The Tempest Engine is described as having two wavefronts, one for 3D audio and the system, and another for the game. That second wavefront looks to be the "other" use case for the CU. Contending for throughput with 3D audio would give a fraction of the 100 GFLOPS for developer use, and in terms of throughput it's modest versus the Zen 2 cores and a rounding error next to the GPU.
So what would the other wavefront offer to developers, and what use cases could they find for that limited throughput that would justify the additional effort?

It borrows SPU’s features and is specialized for *3D audio*, which brings up the latency question. How bad is TrueAudio and True Audio Next's latency ?
No clear specification was given. It should be better, and there was one graph in relation to Steam audio that gave a ms budget--but it wasn't clear if that was the round-trip time for command submission to kernel completion, or just the kernel execution time.
Sony's HSA audio presentation in 2013 indicated an ambition for a very responsive audio pipeline with sub-5ms or faster overall latency. Depending on the implementation for Tempest, there could be ways in which it can improve upon TruAudio in terms of latency, though some elements that were mentioned in 2013 that may not be well-addressed.

So Sony might have deployed a multi-pronged solution for PS5 game audio. The stack can run on CPU, GPU, and Tempest based on needs.
There were challenges to a pipeline running on all three components, which the PS5 disclosures have not given enough detail to know if they were resolved.
 
The Tempest Engine is described as having two wavefronts, one for 3D audio and the system, and another for the game. That second wavefront looks to be the "other" use case for the CU. Contending for throughput with 3D audio would give a fraction of the 100 GFLOPS for developer use, and in terms of throughput it's modest versus the Zen 2 cores and a rounding error next to the GPU.
So what would the other wavefront offer to developers, and what use cases could they find for that limited throughput that would justify the additional effort?

It doesn’t sound like Cerny is separating the use cases that way.

To keep utilization close to 100%, once loaded, the CU will need to keep running the same program on new sets of data.
Perhaps the 2 wavefronts correspond to the double buffering technique commonly used in SPUs ? (i.e., They would be working on different section of data for the same job)

I have no clue where this program sits (command queue or otherwise ?), how long/big it is, and whether developers can load/switch to a different program quickly.
 
So HRTF transforms are basically FFTs, same as convolution reverb just with incredibly short impulse response times. Convolution reverb you could have 1s to 2s impulse responses, where HRTF would be milliseconds. That's why the HRTF is so much faster to calculate. I'm wondering what the impact of short impulse responses would be on TrueAudio Next vs Sony's Tempest.
 
So HRTF transforms are basically FFTs, same as convolution reverb just with incredibly short impulse response times. Convolution reverb you could have 1s to 2s impulse responses, where HRTF would be milliseconds. That's why the HRTF is so much faster to calculate. I'm wondering what the impact of short impulse responses would be on TrueAudio Next vs Sony's Tempest.

He mentioned recalculate every sound source per audio tick, presumably tied to the audio sampling rate ?
 
Last edited:
It doesn’t sound like Cerny is separating the use cases that way.

To keep utilization close to 100%, once loaded, the CU will need to keep running the same program on new sets of data.
Perhaps the 2 wavefronts correspond to the double buffering technique commonly used in SPUs ? (i.e., They would be working on different section of data for the same job)

I have no clue where this program sits (command queue or otherwise ?), how long/big it is, and whether developers can load/switch to a different program quickly.
From the most recent article, Cerny is quoted assigning different functions for the two wavefronts.
https://www.eurogamer.net/articles/digitalfoundry-2020-playstation-5-the-mark-cerny-tech-deep-dive

"GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two," explains Mark Cerny. "One wavefront is for the 3D audio and other system functionality, and one is for the game ...

This leads to my question what functionality outside of 3D audio and the system can the game assign to the compute throughput that remains, how much compute that is, and what advantages it can have over the more abundant and standardized compute on the CPU and GPU.
 
From the most recent article, Cerny is quoted assigning different functions for the two wavefronts.
https://www.eurogamer.net/articles/digitalfoundry-2020-playstation-5-the-mark-cerny-tech-deep-dive

Thanks for the article. I haven't seen it before. Was searching for their audio clock rate.

"In general, the scale of the task in dealing with game audio is already extraordinary - not least because audio is processed at 48000Hz with 256 samples, meaning there are 187.5 audio 'ticks' per second - meaning new audio needs to be delivered every 5.3ms."

... and

"'GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two,' explains Mark Cerny. 'One wavefront is for the 3D audio and other system functionality, and one is for the game. Bandwidth-wise, the Tempest engine can use over 20GB/s, but we have to be a little careful because we don't want the audio to take a notch out of the graphics processing. If the audio processing uses too much bandwidth, that can have a deleterious effect if the graphics processing happens to want to saturate the system bandwidth at the same time.'"

So they alternate the 2 waves to prevent consuming too much bandwidth (?)

Hmm.. may be not. Running a second function may potentially increase bandwidth. The 3D audio path renders at real time, so they can't get too far ahead of the "playhead".

They have spare cycles.

But seems like they need low latency to respond to changing player position, and short audio ticks.

This leads to my question what functionality outside of 3D audio and the system can the game assign to the compute throughput that remains, how much compute that is, and what advantages it can have over the more abundant and standardized compute on the CPU and GPU.

Outside 3D audio, perhaps sensor input (e.g, PSEye, Guitar, ...), I/O value add (e.g., conversion, decoding, encryption), AI ?

Those first party studios will have to answer your question. :)
 
Last edited:
So they alternate the 2 waves to prevent consuming too much bandwidth (?)
My interpretation is that the wavefront dealing with 3D audio and system is the system-reserved functionality that makes up the baseline audio offering for the PS5 platform. The feature would need to be consistent and universally available, so a separate wavefront with baseline allocation of resources avoids games accidentally throttling the audio pipeline. That sort of consistency may also have implications for the clock speed of the unit and GPU.
The article says the Tempest CU works at GPU clock speeds, but the throughput numbers appear to be consistent with something noticeably slower than 2.23 GHz and the consistency angle may constrain boost.

One of the videos linked by the article covers the bandwidth consumption of the unit as well, but gives the penalty as 20%, which is much larger than 20 GB/s. I would think 20 GB/s would be a more reasonable unit bandwidth, but CUs can physically draw much more in normal GPU operation.
 
Back
Top