Playstation 5 [PS5] [Release November 12 2020]

Scott_Arm · Apr 15, 2020

3dilettante said:
My interpretation is that the wavefront dealing with 3D audio and system is the system-reserved functionality that makes up the baseline audio offering for the PS5 platform. The feature would need to be consistent and universally available, so a separate wavefront with baseline allocation of resources avoids games accidentally throttling the audio pipeline. That sort of consistency may also have implications for the clock speed of the unit and GPU.
The article says the Tempest CU works at GPU clock speeds, but the throughput numbers appear to be consistent with something noticeably slower than 2.23 GHz and the consistency angle may constrain boost.

One of the videos linked by the article covers the bandwidth consumption of the unit as well, but gives the penalty as 20%, which is much larger than 20 GB/s. I would think 20 GB/s would be a more reasonable unit bandwidth, but CUs can physically draw much more in normal GPU operation.

What if it’s not an RDNA cu but a GCN cu? RDNA cu’s have double the width don’t they?

3dilettante · Apr 15, 2020

Scott_Arm said:
What if it’s not an RDNA cu but a GCN cu? RDNA cu’s have double the width don’t they?

The customization could change quite a bit.
It may depend on how one interprets the 64 FLOPs per clock claim.
FMA is usually considered two FLOPs, which is what gives 128 FLOP/cycle for both GCN and RDNA CUs.

However, that logic applied to Tempest would only require 32 lanes, or 1 RDNA SIMD or two GCN SIMDS.
If counting FMA as one operation, or the custom CU somehow lost the capability, then 2 SIMD32 or 4 SIMD16.
However, only having two wavefronts means that there would be at most two SIMDs.
Having two wavefronts on a single SIMD would help hide utilization bubbles due to stall cycles or other miscellaneous events, but only a 32-wide RDNA SIMD would give the vector throughput.

In a number of these scenarios, the CU is losing half its SIMDs. That it wasn't explicitly mentioned leaves room for possible sources of misinterpretation.

JPT · Apr 15, 2020

AlphaWolf said:
Because Sony has other interests than Playstation?

Yes, I am wondering how much of this can be used in other parts of Sony? Professional TV productions, movies/tv shows, other consumer electronics etc.
Been to IBC in Amsterdam a couple of times and Sony has big had big presence.

patsu · Apr 15, 2020

3dilettante said:
My interpretation is that the wavefront dealing with 3D audio and system is the system-reserved functionality that makes up the baseline audio offering for the PS5 platform. The feature would need to be consistent and universally available, so a separate wavefront with baseline allocation of resources avoids games accidentally throttling the audio pipeline. That sort of consistency may also have implications for the clock speed of the unit and GPU.

Agreed. Cerny also mentioned they overbudgeted the HRTF needs so that this “GPU coprocessor” can be used by developers for other purposes. Even though the HRTF calculations are continuous, they can’t render too far ahead of player movement in case it changes (unless they want to do it speculatively).

The article says the Tempest CU works at GPU clock speeds, but the throughput numbers appear to be consistent with something noticeably slower than 2.23 GHz and the consistency angle may constrain boost.

One of the videos linked by the article covers the bandwidth consumption of the unit as well, but gives the penalty as 20%, which is much larger than 20 GB/s. I would think 20 GB/s would be a more reasonable unit bandwidth, but CUs can physically draw much more in normal GPU operation.

Yeah I noticed the mismatch in GFLOP count. The SIMD unit must be different but still GPU-based, and may be there’s some overhead when loading/switching program and the wavefronts ? Or they need to wait for certain input from GPU/CPU ?

As for the 20% penalty, it sounds too high. Wondering if Leadbetter misspoke.

patsu · Apr 15, 2020

JPT said:
Yes, I am wondering how much of this can be used in other parts of Sony? Professional TV productions, movies/tv shows, other consumer electronics etc.
Been to IBC in Amsterdam a couple of times and Sony has big had big presence.

They made that mistake in Cell. Found out gaming needs were too different. It would be more cost effective to use other solutions in the relevant industries.

It seems this “GPU coprocessor” will have the same challenges.

Globalisateur · Apr 15, 2020

3dilettante said:
From the most recent article, Cerny is quoted assigning different functions for the two wavefronts.
https://www.eurogamer.net/articles/digitalfoundry-2020-playstation-5-the-mark-cerny-tech-deep-dive

This leads to my question what functionality outside of 3D audio and the system can the game assign to the compute throughput that remains, how much compute that is, and what advantages it can have over the more abundant and standardized compute on the CPU and GPU.

Saving CPU and GPU ressources ?

mrcorbo · Apr 15, 2020

Globalisateur said:
Saving CPU and GPU ressources ?

I think it's less this (since not having this would let you use that die space for more GPU resources) and more about having dedicated resources for audio and, specifically dedicated resources for the kind of audio the platform team want to deliver. Sound engineers have talked about how this will allow them to have more resources since they are usually a low priority when the processing budget is divvied up. They will now not have to compete (as much) for access to the pool of shared resources.

iroboto · Apr 15, 2020

Globalisateur said:
Saving CPU and GPU ressources ?

that shouldn't matter according to Cerny, you're only at up to 40% utilized and he's compensated with clock speed. There should be room for 100-200 extra GF to fit on top of what's already there. We have async compute and all sorts of cost saving options.

I think just having a dedicated unit means that studios can't re-prioritize the silicon away from audio, and that in itself seems ideal.

turkey · Apr 15, 2020

Scott_Arm said:
What if it’s not an RDNA cu but a GCN cu? RDNA cu’s have double the width don’t they?

What if its based on a CDNA cu even. modern and compute focused (and I know nothing about it).

chris1515 · Apr 15, 2020

Scott_Arm said:
It is good enough for audio work otherwise Steam Audio wouldn’t bother supporting it for VR.

They talk about hundreds of sources with only one CUs, often with Steam Audio you need to reserve multiple CUs for reaching this level.

After I think one method is perfect for PC everyone has a GPU and the soundcard are dead and the user decides himself reserving some CUs for VR experience. For a game without Steam Audio, you can use the CUs for graphics.

On console, there is a double problem first reserved some CUs means to lose graphics capacity for audio and again the problem of audio engineer losing against rendering engineer comes back again and the second on is the die size here it seems to a specialized CUs is pretty efficient from the die size point of view more than a normal CUs. And Sony solves the problem one of the wavefronts is reserved by the system for itself and 3d audio, the second wavefront is for the audio game effect. In the case of PS5 the CUs is only useful for audio, no real flexibility and this is the goal of the platform holder.

Consoles aren't PC.

Scott_Arm · Apr 15, 2020

chris1515 said:
They talk about hundreds of sources with only one CUs, often with Steam Audio you need to reserve multiple CUs for reaching this level.

....

I think you're somewhat comparing apples to oranges. Steam Audio uses TrueAudio Next primarily for convolution reverb with long impulse responses in the range of 1 to 2 seconds. The Tempest processor is going to be used for HRTF transforms which have impulse responses in the millisecond range. It's the same type of calculation, but much much cheaper because of the short impulse response so they can do many more audio sources.

Ultimately, if you compare the steam audio information to what we've been given about Tempest, I don't think Tempest will actually be as capable for things like convolution reverb with long impulse responses. SteamAudio showed I think 80 convolution sources for 1s impulse responses and 1st order ambisonics on an RX480 with 4 CUs reserved. That's about 650 GFLOPS of processing power. Tempest sounds like it's around 100 GFLOPS, so it would have to be 600% more efficient to match the same capability. I can see it being far more efficient than TrueAudio Next, but we're talking about just matching to get the same number of convolution sources (80). 600% is a lot.

Steam Audio does not use TrueAudio Next for HRTF because they can do it cheaply for thousands of sources on one modern cpu core. That's an expense that a PC can handle, because you can always buy a better processor. For a console, a full core would be a lot to give up, and as others have mentioned audio people would probably lose that fight with the physics, ai, graphics or general game code people.

Assume tempest can reach 100% utilization and that's 100 GFLOPS. If you take the RX480 4CU reservation, it'll only need to achieve 15% utilization to match the same workload.

chris1515 · Apr 15, 2020

Scott_Arm said:
I think you're somewhat comparing apples to oranges. Steam Audio uses TrueAudio Next primarily for convolution reverb with long impulse responses in the range of 1 to 2 seconds. The Tempest processor is going to be used for HRTF transforms which have impulse responses in the millisecond range. It's the same type of calculation, but much much cheaper because of the short impulse response so they can do many more audio sources.

Ultimately, if you compare the steam audio information to what we've been given about Tempest, I don't think Tempest will actually be as capable for things like convolution reverb with long impulse responses. SteamAudio showed I think 80 convolution sources for 1s impulse responses and 1st order ambisonics on an RX480 with 4 CUs reserved. That's about 650 GFLOPS of processing power. Tempest sounds like it's around 100 GFLOPS, so it would have to be 600% more efficient to match the same capability. I can see it being far more efficient than TrueAudio Next, but we're talking about just matching to get the same number of convolution sources (80). 600% is a lot.

Steam Audio does not use TrueAudio Next for HRTF because they can do it cheaply for thousands of sources on one modern cpu core. That's an expense that a PC can handle, because you can always buy a better processor. For a console, a full core would be a lot to give up, and as others have mentioned audio people would probably lose that fight with the physics, ai, graphics or general game code people.

Assume tempest can reach 100% utilization and that's 100 GFLOPS. If you take the RX480 4CU reservation, it'll only need to achieve 15% utilization to match the same workload.

Watch again Road to PS5, Mark Cerny say the Tempest Engine is powerful enough to do convolution reverb.

You have one wavefront for the 3D audio HRTF and and the other for effect.

And the response for the power is not very precise it is probably between 100 and 200 GFlops.

Edit: It it was exactly the same efficiency Sony would have choose one PC CUs without customization.

And if you need 2 or 3 Cus to do the same thing than Tempest Engine. This is too much. 4 CUs seems a bit high.

patsu · Apr 15, 2020

mrcorbo said:
I think it's less this (since not having this would let you use that die space for more GPU resources) and more about having dedicated resources for audio and, specifically dedicated resources for the kind of audio the platform team want to deliver. Sound engineers have talked about how this will allow them to have more resources since they are usually a low priority when the processing budget is divvied up. They will now not have to compete (as much) for access to the pool of shared resources.

It probably goes both ways.

Having a dedicated subsystem for 3D audio cuts down latency, which is needed here. The custom CU's efficient nature (SPU-like characteristics with GPU power) fits the "near real-time" nature of the job well.

At the same time the CPU and GPU should not be affected by the custom CU much (if they are careful with the bandwidth usage). Besides keeping the caches "clean", there are probably telemetries/profiling systems monitoring the CPU and GPU. A stable and accurate run-time profile of those parts should make it easier to maintain Cerny's clocking scheme correctly.

In addition, since the different needs are served in a "compartmentalized" fashion, his time critical, carefully prioritized game world streaming subsystem should also achieve more predictable outcome. Most importantly, *if* not, developers have tangible ways to tweak different parts of their games. Audio, I suspect, will be among the top priorities; would be annoying to stream the game world successfully but audio went missing or out of sync.

At the system level, Cerny and the diligent game developers would have better and tighter control of the entire system.

3dilettante · Apr 15, 2020

patsu said:
Yeah I noticed the mismatch in GFLOP count. The SIMD unit must be different but still GPU-based, and may be there’s some overhead when loading/switching program and the wavefronts ? Or they need to wait for certain input from GPU/CPU ?

There would be overhead in switching wavefronts because there is time needed to load the necessary context for a new kernel into the CU's general purpose and system registers. For a standard CU, part of the transfer comes from the dispatch pipeline of the GPU, while other parts part of the initial code compiled into a shader. Part of the recommendation for having more than the minimum number of wavefronts is to help hide spin-up periods like that. The described level of parallelism for Tempest is close to no concurrency from the developer standpoint. Perhaps Sony expects audio to be consistently utilizing what the developer cannot, or that the developer's fraction is what scraps the system reserve would consider lost to switch overhead.
Some details from the AMD patent that was referenced earlier about a custom CU with persistent wavefronts would remove a good chunk of the launch overhead, which may be consistent with the idea that high utilization can be managed with such a limited number of wavefronts.

As for the 20% penalty, it sounds too high. Wondering if Leadbetter misspoke.

It's possible. 20 GB/s seems like a reasonable number, but is a worst-case 4% consumption enough of a problem to require special mention?

Globalisateur said:
Saving CPU and GPU ressources ?

Maybe some of it, if the developer is that short on resources. The 3D audio/system wavefront would consume a large fraction of the throughput. 8 Zen 2 cores at 3.5 GHz are just shy of 0.9 TF, and the GPU is just short of 10.3 TF.
If the developer could use all of Tempest, that's 11% of CPU peak and 1% of the GPU, but I don't think the system-reserved features would allow that. With a likely single-digit percentage of CPU capability and less than a percent of the GPU, would developers be that tightly resource constrained to look to Tempest? The method for programming for it is going to be a third architectural target and source of implementation complexity, so does this modest amount of compute have specific advantages that can make it more appealing than finding a fraction of percent in GPU idle time?

iroboto said:
I think just having a dedicated unit means that studios can't re-prioritize the silicon away from audio, and that in itself seems ideal.

If Sony was really only concerned about developer priorities, they could have reserved standard CUs for themselves. TrueAudio next makes provision for CU reservation where a developer can set aside CUs prior to graphics shaders getting access. It seems like a straightforward extension for Sony to take a reservation first without going through the effort of modifying the architecture. That they did change the architecture indicates there's certain capabilities missing or possible downsides to using what's already there.

turkey said:
What if its based on a CDNA cu even. modern and compute focused (and I know nothing about it).

Details aren't confirmed at this time, but there's a good chance that CDNA includes an additional vector unit with matrix multiplication capabilities and a wide range of formats.
While it's possible Sony could just mention the normal units since the matrix operations are more specialized and many form have lower precision, it would be neglecting to mention a large number of operations per clock.

Scott_Arm · Apr 15, 2020

chris1515 said:
Watch again Road to PS5, Mark Cerny say the Tempest Engine is powerful enough to do convolution reverb.

Of course it can do convolution reverb. The question is how many sources and how long the IR is, what order of ambisonics if ambisonics at all. You can compare the Steam Audio metrics and the processing power it uses to Tempest and get a good understanding how much more efficient it would need to be to be to match the same metrics.

chris1515 · Apr 15, 2020

Scott_Arm said:
Of course it can do convolution reverb. The question is how many sources and how long the IR is, what order of ambisonics if ambisonics at all. You can compare the Steam Audio metrics and the processing power it uses to Tempest and get a good understanding how much more efficient it would need to be to be to match the same metrics.

I never say it is 4 CUs but if you need two or three CUs to do the same things the Tempest engine is the best choice.

Scott_Arm · Apr 15, 2020

chris1515 said:
I never say it is 4 CUs but if you need two or three CUs to do the same things the Tempest engine is the best choice.

I agree. I'm assuming tempest is much more efficient so the silicon cost is lower. Rather than two or more CUs from the GPU, they can have one dedicated CU that's customized for the job. I'm just curious about the realistic capabilities this one CU is going to have. We have some numbers about what 4 RX480 CUs can do with TAN. We know how much compute capability those CUs have. So it gives us an idea of what type of calculations we can expect Tempest to do, and how many.

BRiT · Apr 15, 2020

chris1515 said:
And the response for the power is not very precise it is probably between 100 and 200 TFlops.

TFlops? Or GFlops? Silly decimal point or something?

MrFox · Apr 15, 2020

chris1515 said:
Watch again Road to PS5, Mark Cerny say the Tempest Engine is powerful enough to do convolution reverb.

Yeah, it remains to be seen how it's implemented, but I cannot think of any audio algorithm which wouldn't keep it pretty much at 100% ALU usage all the time. Worst case, a very wide convo aperture might use the GPU cache a little more when splitting it into slices. It wouldn't get any stalls, since it can DMA whatever it needs long enough in advance. The provisioning of processing power here would be pretty much 100%, since the execution is entirely predictable. Being SPU-like and all that...

chris1515 · Apr 15, 2020

BRiT said:
TFlops? Or GFlops? Silly decimal point or something?

Gflops on my phone it give TFlops autocorrection, It all too much of TFlops.

Playstation 5 [PS5] [Release November 12 2020]

Scott_Arm

3dilettante

JPT

patsu

patsu

Globalisateur

Globby

mrcorbo

Foo Fighter

iroboto

Daft Funk

turkey

chris1515

Scott_Arm

chris1515

patsu

3dilettante

Scott_Arm

chris1515

Scott_Arm

BRiT

(>• •)>⌐■-■ (⌐■-■)

MrFox

Deludedly Fantastic

chris1515

Similar threads