Revenge of Cell, the DPU Rises *spawn*

Status
Not open for further replies.
Because, when you are not raytracing, the dedicated raytracing hardware would be idle when that die space could accommodate more general-purpose compute resources that are always going to be useful. The practicality of dedicated-purpose functional blocks is dictated by several factors. Some off the top of my head: The frequency that the type of compute work they are most efficient at is performed, how much more efficiently they perform that work over a more general-purpose compute resource, how much die space and power they require and how much complexity it adds to the programming model.

Did TrueAudio add complexity to audio programming & voice recognition? Most games should be using real time lighting & reflection/shadows & so on so you will be using it more than not using it.
 
TrueAudio isn't being used in the consoles?

I thought any game using the Wwise and/or FMOD APIs would be automatically using the Tensilica DSPs.
 
There is a base concept of having a block of Tensillica cores and customizable IP in an IO-coherent block near the GPU domain. The base tech and platform-specific uses for that block of hardware are a bit distinct from the "build it and they will come" PC silicon. Without a semicustom client's specific needs and/or IP, the PC formulation did not seem to be enticing enough. AMD still hasn't stopped selling GPUs without it, which didn't help.
Sony at least partly uses custom hardware for system services related to media, and then Microsoft has a more customized and walled-off audio path.

Wwise has or had a plug-in that could be added in order to start enabling TrueAudio. The few times TrueAudio was used, it was noted, but that was years ago.
I looked for disclosures about VR maybe finding a use for TrueAudio, but I may have missed it if there was.
 
You have a very perverse definition of terms.
 
There is a base concept of having a block of Tensillica cores and customizable IP in an IO-coherent block near the GPU domain. The base tech and platform-specific uses for that block of hardware are a bit distinct from the "build it and they will come" PC silicon. Without a semicustom client's specific needs and/or IP, the PC formulation did not seem to be enticing enough. AMD still hasn't stopped selling GPUs without it, which didn't help.
Sony at least partly uses custom hardware for system services related to media, and then Microsoft has a more customized and walled-off audio path.

Wwise has or had a plug-in that could be added in order to start enabling TrueAudio. The few times TrueAudio was used, it was noted, but that was years ago.
I looked for disclosures about VR maybe finding a use for TrueAudio, but I may have missed it if there was.

My guess is that the 8 ACE's & audio co-processor ideas both came from Sony.
 
I say "Could PlayStation 4 breathe new life into Software Based Rendering? " & people say no & explain how it's crazy for anyone to use software based rendering when there is fix function hardware
That's not what that thread said. And don't go quoting pieces here to try and restart that argument. Anyone who wants to review that thread can go visit it. It was long and had multiple points.

& now when I make a prediction about next gen consoles handling some rendering tasks with a custom chip I'm told about how it's crazy to have custom logic when it can be done with software rendering. lol
It's not as black and white as that. None of the arguments (software rendering or custom hardware) are as black and white as a binary choice. The main argument here is that adding custom hardware by design for the purpose of VR reprojection would not be wise given the GPU is a perfect fit for that work already. Where you've started expanding the idea, you are visiting the grey areas of decision making. Would an audio core be worth including? Is it more efficient? Yes. Is it worth replacing general purpose silicon with? Well, how much better is XB1 audio than PS4? What if its audio processor had been? Couldn't MS have forgone the custom hardware and just used a beefier GPU and got the same results as a PS4 while also enabling better graphics or compute or whatever devs want to use?

That in itself is a debate with lots of variables only worth noting here for illustration. Custom silicon in itself does not equal better design or even better real-world performance. Each option needs to be considered carefully - personally I think the inclusion of ray tracing hardware makes a lot of sense if its effective and versatile enough as casting rays is used for AI and physics as well as graphics. A DPU on the other hand, certainly for VR reprojection or other image manipulation functions (post effects) doesn't make sense to me unless it can be included at neglegable cost and transparently to the devs and the improvements are significant.
 
That's not what that thread said. And don't go quoting pieces here to try and restart that argument. Anyone who wants to review that thread can go visit it. It was long and had multiple points.

It's not as black and white as that. None of the arguments (software rendering or custom hardware) are as black and white as a binary choice. The main argument here is that adding custom hardware by design for the purpose of VR reprojection would not be wise given the GPU is a perfect fit for that work already. Where you've started expanding the idea, you are visiting the grey areas of decision making. Would an audio core be worth including? Is it more efficient? Yes. Is it worth replacing general purpose silicon with? Well, how much better is XB1 audio than PS4? What if its audio processor had been? Couldn't MS have forgone the custom hardware and just used a beefier GPU and got the same results as a PS4 while also enabling better graphics or compute or whatever devs want to use?

That in itself is a debate with lots of variables only worth noting here for illustration. Custom silicon in itself does not equal better design or even better real-world performance. Each option needs to be considered carefully - personally I think the inclusion of ray tracing hardware makes a lot of sense if its effective and versatile enough as casting rays is used for AI and physics as well as graphics. A DPU on the other hand, certainly for VR reprojection or other image manipulation functions (post effects) doesn't make sense to me unless it can be included at neglegable cost and transparently to the devs and the improvements are significant.

Sony explained the point of using TrueAudio a few years ago (before they announced that it was TrueAudio). Now take this same thinking & put it towards rendering , you have stuff like reprojection that you been using in lots of games & you know that you're going to be using it even more in the upcoming generation, why not offload that to something like a DPU? Then you have other rendering algorithms that you have been running on the CPU or GPU that you think is worth having on the system level like splatting , ray tracing & so on. So you basically design your DPU to be like a programmable renderer that can run algorithms like Enlighten (just an example).

slide-3-1024.jpg
 
TrueAudio needs to do stuff that isn't a good fit for GPU compute (low latency, CODECS, etc). Image processing like reprojection is a good fit for compute. There needs to be a decent argument for taking a workload the GPU is good and at putting in dedicated hardware to achieve the same. Obviously for workloads the GPU isn't good at (and can't be made good at), adding other silicon can make sense (CPU, realtime video HW, audio in some cases)
 
Most games should be using real time lighting & reflection/shadows

I wouldn't say so because developers need flexibility to design whats best for the game they want to make. Who's to say if everyone needs to use real time path tracing with DSPs to achieve their goals? Real time rendering still has a long long long way to go and given the current economic climate in the games industry, (particular in the high production cost projects) having highly specialized fixed function units that take up SoC space in a game console doesn't seem like a economically viable option in the near future.

Its the reason why modern game consoles (X1/PS4) use commodity hardware, the entire industry is far more risk adverse than it was. Therefore, hardware designs that go against the grain have to be much better than currently available if its going to be economically viable to write software for it.
 
Sony explained the point of using TrueAudio a few years ago (before they announced that it was TrueAudio). Now take this same thinking & put it towards rendering , you have stuff like reprojection that you been using in lots of games & you know that you're going to be using it even more in the upcoming generation, why not offload that to something like a DPU? Then you have other rendering algorithms that you have been running on the CPU or GPU that you think is worth having on the system level like splatting , ray tracing & so on. So you basically design your DPU to be like a programmable renderer that can run algorithms like Enlighten (just an example).

TrueAudio is not the same thing as what is implemented in the consoles, in terms of who can access it and the mix of resources IP. There's the base idea that there is a DSP block, and it seems like AMD's TrueAudio is a vanilla implementation that exists in the absence of specific problem it means to solve.
The presentation the slide you attached had a followup slide about what (significant) drawbacks running audio on the ACP had, some of which are different from what TrueAudio has.

Slide 4 of the following.

Exotic hardware and dev environment
- Closed to games
- Closed to middleware
- Platform specific
Asynchronous interface
-Can't have sequential interleaving of DSP back and forth between CPU and ACP w/o latency buildup
-But ultimately, we want the DSP pipeline to be data driven (by artists who know nothing about this)
-Modularity
Slow clock rate @ 800MHz, very limited SIMD and no FP support
-Tough sell against Jaguar for many DSP algorithms
-Very tight local memory shared by multiple DSP cores
Already pretty busy with codec loads and system tasks

Unlike TrueAudio, the ACP is more closed off and dedicated to Sony's system services.
Even in this constrained case, we see it does inject complexity if it is being used for audio since work has to be transferred and another interface needs to be managed.
The PC environment frequently has a vast reservoir of CPU power, when Jaguar is frequently enough to beat the ACP.
The ACP is hidden behind a secure API (from your slide) which in the presentation that used this slide deck added a frame of audio latency, and the ACP dedicates much of its limited resources to codec and system services.
The payoff to all these restrictions is that it serves a purpose for Sony, so it gets used.

If a DPU is added to an SOC, but its data production is essentially being fed directly into the graphics pipeline, the GPU domain is very accommodating and can readily envelop a custom microprocessor or controller inside its borders.
 
TrueAudio needs to do stuff that isn't a good fit for GPU compute (low latency, CODECS, etc). Image processing like reprojection is a good fit for compute. There needs to be a decent argument for taking a workload the GPU is good and at putting in dedicated hardware to achieve the same. Obviously for workloads the GPU isn't good at (and can't be made good at), adding other silicon can make sense (CPU, realtime video HW, audio in some cases)

There is also a power budget so if you can offload things to a low power chip why not?



I wouldn't say so because developers need flexibility to design whats best for the game they want to make. Who's to say if everyone needs to use real time path tracing with DSPs to achieve their goals? Real time rendering still has a long long long way to go and given the current economic climate in the games industry, (particular in the high production cost projects) having highly specialized fixed function units that take up SoC space in a game console doesn't seem like a economically viable option in the near future.

Its the reason why modern game consoles (X1/PS4) use commodity hardware, the entire industry is far more risk adverse than it was. Therefore, hardware designs that go against the grain have to be much better than currently available if its going to be economically viable to write software for it.

You can use it for lighting , reflections & so on you don't have to do real time path tracing, also it's not fixed function.

2EoTzkU.png
 
You can use it for lighting , reflections & so on you don't have to do real time path tracing, also it's not fixed function.

Given the large number of variations possible between fixed function and fully programmable my questions are:

Specifically when and how can the functions a DPU is capable of be changed or added on to from an initial implementation? On the fly by a running program? As part of a firmware update?

How much functionality can be programmed into a DPU before it is so non-specialized as to not be notably more efficient at the tasks it is capable of as another type of processor?
 
Last edited:
TrueAudio is not the same thing as what is implemented in the consoles, in terms of who can access it and the mix of resources IP. There's the base idea that there is a DSP block, and it seems like AMD's TrueAudio is a vanilla implementation that exists in the absence of specific problem it means to solve.
The presentation the slide you attached had a followup slide about what (significant) drawbacks running audio on the ACP had, some of which are different from what TrueAudio has.

Slide 4 of the following.

Exotic hardware and dev environment
- Closed to games
- Closed to middleware
- Platform specific
Asynchronous interface
-Can't have sequential interleaving of DSP back and forth between CPU and ACP w/o latency buildup
-But ultimately, we want the DSP pipeline to be data driven (by artists who know nothing about this)
-Modularity
Slow clock rate @ 800MHz, very limited SIMD and no FP support
-Tough sell against Jaguar for many DSP algorithms
-Very tight local memory shared by multiple DSP cores
Already pretty busy with codec loads and system tasks

Unlike TrueAudio, the ACP is more closed off and dedicated to Sony's system services.
Even in this constrained case, we see it does inject complexity if it is being used for audio since work has to be transferred and another interface needs to be managed.
The PC environment frequently has a vast reservoir of CPU power, when Jaguar is frequently enough to beat the ACP.
The ACP is hidden behind a secure API (from your slide) which in the presentation that used this slide deck added a frame of audio latency, and the ACP dedicates much of its limited resources to codec and system services.
The payoff to all these restrictions is that it serves a purpose for Sony, so it gets used.

If a DPU is added to an SOC, but its data production is essentially being fed directly into the graphics pipeline, the GPU domain is very accommodating and can readily envelop a custom microprocessor or controller inside its borders.

PS4 & Xbox One both already use Tensilica DPU/DSP for offloading some functions from the CPU/GPU like speech recognition, face recognition , game streaming / recording & so on it don't have to beat the CPU & GPU at every task but if you already know it's something that will be needed in your next gen console why not offload it to the DPU?

recording & so on
 
Given the large number of variations possible between fixed function and fully programmable my questions are:

Specifically when and how can the functions a DPU is capable of be changed or added on to from an initial implementation? On the fly by a running program? As part of a firmware update?

How much functionality can be programmed into a DPU before it is so non-specialized as to not be notably more efficient at it the tasks it is capable of as another type of processor?

Just replace the DPU with an FPGA and you're all set.
 
PS4 & Xbox One both already use Tensilica DPU/DSP for offloading some functions from the CPU/GPU like speech recognition, face recognition , game streaming / recording & so on it don't have to beat the CPU & GPU at every task but if you already know it's something that will be needed in your next gen console why not offload it to the DPU?

A DPU is a product composed of a block of DSP and interconnect logic paired with the tools and service for customizing the block and interface. The base concept that goes into TrueAudio block and the similar blocks for the console can fall into that service or one like, and perhaps it actually is one depending on how AMD implemented it. There are certain constraints in terms of absolute power consumption for encode/decode, or latency with audio, that can leave the CPU or GPU as being less ideal.

When it's dealing with the graphics pipeline, there's a name for a region of simple processors tied together with custom memory, interconnects, and IP: GPU. AMD has shown it would happily customize that portion, given its expertise and ownership of that region, versus the DSP blocks where it would not. Why does this custom block of cores that hooks into the graphics fixed and programmable paths change things versus all the other custom cores already there?
 
Status
Not open for further replies.
Back
Top