Can AMD GPUs implement 'hardware' fixed function pipelines through firmware?

What you're effectively asking is whether GPUs can take on more functionality, which they could, ACEs aren't required for that to happen, they may happen to improve the saturation of the hardware but it's not some Magic bullet.
In the case of audio, compute can only do that job with a low latency interface to the shaders. The HWS enables that, opening up functionality that otherwise was practically impossible, I think.

Additionally, searching for whatever onQ meant by 'DPU' threw up this, where the posit is there being discrete DPU silicon in future GPUs. This is the very opposite, using compute on the existing GPU and just allowing low latency, more reliable access to compute resources.
 
TrueAudio being moved from a DSP to compute gives the answer to my question.

This change wasn't made because the GPU was as efficient at these operations as the custom hardware, though. It was made because the efficiency gained by using custom silicon which was only useful for these tasks (and that just sat idle when it didn't have audio processing to do) wasn't worth the cost of licensing the IP and the die area it consumed once it was made possible to use the more general-purpose GPU cores to serve the same purpose.

Also, "creating fixed-function pipelines" is not an accurate way to describe this. It's this characterization, more than anything else, that *everyone* has a problem with in this case. And, more generally, it's your unwillingness to ever acknowledge and rectify your own errors of understanding or explanation that cause all of the grief you get from other posters and the moderators. When you get this same type of reaction in multiple threads across multiple forums, a "sane" reaction is to realize you're probably doing something wrong somewhere. You not coming to this realization after all of this time is what makes you look "crazy", not your ideas.
 
It's actually quite similar to an article I once read about how GPUS and sound chips would no longer be needed because CPUs would continually scale higher in core count such that these extra cores would take over the usage of this dedicated hardware accelerators.

RIP Cell & Larrabee.
 
TrueAudio being moved from a DSP to compute gives the answer to my question.
The slide you posted has nothing to do with Sony. If Sony's next console has a DSP does that disprove your theory? Overtime it's likely that things like TrueAudio get consumed by more general processors if they aren't used much and the business model doesn't support the silicon cost. A console is more likely to keep a dedicated processor because it can ensure it gets used often. For PS4 to switch to using compute for audio developers must come up with a technique that's not possible with the DSP or works better using compute.

The same applies to graphics. Most developers are using the compute capabilities to augment the fixed function graphics pipeline, not to replace it. Dreams being the exception we've heard about. In the case of Dreams it doesn't mean compute is better than the fixed function pipe it just enables them to achieve their artistic vision. Of course this is the reason for compute pipes. To allow developers to do things fixed function hardware wasn't designed to do.
 
I wouldn't call ACEs fixed function hardware. GPU front ends are fully programmable processors with programmable memory access. You just don't have access to program it yourself, the driver team does. Traditionally fixed function hardware tends to have fixed data inputs and data outputs. For example texture sampler, ROP (blend, depth test, HiZ), DXT block decompressor, delta color compressor, triangle backface culling, etc. These are highly performance critical parts of the chip, making it a big perf/watt win to use fixed function hardware to implement them. Also hard-wiring reduces latency compared to a programmable pipeline.
It might also be on a continuum of fixed-function to general-purpose. They are programmable, but I am not sure if they are architecturally permitted to access areas that are not themselves dedicated to scheduling and queuing. They would be lacking a fair amount of the resources and data types not needed for a small micro-controller, nor is it clear how they implement memory accesses and virtual memory support (they play a role managing virtual memory for the rest of the GPU, but do so in a way that might put them to the side of the process).
Possibly, as a low-level detail, some of the other fixed-function blocks have varying levels of control loops implemented. I am speculating at this point, but how complex this is may be related to where GCN does not clock as high and burns so much more power than other architectures.

On the other hand, a GPU front end processor only needs to launch a couple of draws/dispatches in a microsecond. That's huge amount of cycles. It is not worth optimizing its throughput at cycle precision, thus programmable hardware makes sense.
The complexity of what they are doing is another factor. AMD indicated it is able to backport a significant fraction of what they are doing for priority queues back several generations, which indicates that these features have a major software development component that took a long time to get right.
Whether the PS4 would get the same treatment is at the moment unclear. There are physical factors not formally exposed that can affect which products can be updated, such as the maximum size of the microcode store. However, the PS4 has a full complement of 8 ACEs, which may allow for a half-retrofit where one half of the compute front ends takes a microcode patch streamlined for the new mode, while the other half take the base functionality. I am trying to track down the specific context where I saw that mentioned, it takes some of the lower-end GCN implementations out of the running since with one engine they couldn't handle new features and base functionality. More recent versions of the front ends have a larger store.
 
In the case of audio, compute can only do that job with a low latency interface to the shaders. The HWS enables that, opening up functionality that otherwise was practically impossible, I think.

Additionally, searching for whatever onQ meant by 'DPU' threw up this, where the posit is there being discrete DPU silicon in future GPUs. This is the very opposite, using compute on the existing GPU and just allowing low latency, more reliable access to compute resources.


Same thread I explained that a GPU is also a DPU .

 
The slide you posted has nothing to do with Sony. If Sony's next console has a DSP does that disprove your theory? Overtime it's likely that things like TrueAudio get consumed by more general processors if they aren't used much and the business model doesn't support the silicon cost. A console is more likely to keep a dedicated processor because it can ensure it gets used often. For PS4 to switch to using compute for audio developers must come up with a technique that's not possible with the DSP or works better using compute.

The same applies to graphics. Most developers are using the compute capabilities to augment the fixed function graphics pipeline, not to replace it. Dreams being the exception we've heard about. In the case of Dreams it doesn't mean compute is better than the fixed function pipe it just enables them to achieve their artistic vision. Of course this is the reason for compute pipes. To allow developers to do things fixed function hardware wasn't designed to do.

The thread was started with me asking about the PS4 & for it to happen on the PS4 is would be Sony's doing even if it's AMD ,Sony & the dev community that's coming up with the code that works well with the pipeline.
 
Stop refusing to use industry standard terms and making up your own and then changing what they mean.

The mental gymnastics here are actually kind of awe-inspiring. To be so committed to the idea that, "I can not be wrong. Ever." that you literally come up with new definitions for the words you used in prior statements in order to make the statements correct is on a whole other level.
 
As a followup to my earlier post, I found the mention of the microcode store size limitation, in the context of porting the full front-end functionality for HSA back to older GCN versions.
https://www.phoronix.com/forums/for...nn-rock-rocr-hcc-on-linux?p=849406#post849406

Support for both HWS and the Architected Queuing Language (for HSA) could not be hosted on the same microcode engine and still allow that engine to support the standard command types.
Kaveri was able to support AQL and HWS while still being able to support the command format used by graphics by dividing the newer functionality between its two microcode engines. The discrete GPUs do not have that workaround.
The similarity Orbis has with Kaveri in ACE and queue count might mean Sony could bring this in with a similar split, although if AQL can be skipped then perhaps a split isn't needed.

The Volcanic Islands architectures: Tonga and Fiji, do not require playing microcode Tetris to update the functionality, and include being able to context-switch long-running compute wavefronts. Whether that can be brought back is unclear, AMD's patents usually involve some kind of extra hardware to help with this. Polaris would apparently draw from VI.
 
The thread was started with me asking about the PS4 & for it to happen on the PS4 is would be Sony's doing even if it's AMD ,Sony & the dev community that's coming up with the code that works well with the pipeline.
For it to happen on the PS4 would require changing A: the nomenclature of what constitutes a "fixed-function pipeline", as you don't actually make these up using programmable logic, and B: also reality, as a fixed-function pipeline is hardwired at the design and manufacturing stage and has ALWAYS BEEN THAT WAY. (Barring of FPGAs, which are beyond the scope of this discussion.)

Are you aware that you're talking to several actual games developers in this thread, hm? Your bizarre, self-invented nomenclature and explanations would be like me, a layperson, telling an actual rocket engineer how a rocket engine works.

In other words: fucking ludicrous.
 
What the hell should be a DPU speaking about computer?

dementia praecox unit?

joking...


Seriously, cannot find any valid source.
 
DPU in this context is a combination of a set of processors and base hardware IP geared towards customization, and the service and toolsets for customizing, implementing, and building the software for them.
To make the term more generic in today's SoC-heavy world would be to make it redundant with anyone that builds a GPU or their own CPU in a chip with any level of integration. The specific architecture and services the DPU offering provides are what distinguishes it, and that seems like more of a commercial rather than architectural distinction.
 
To be fair, GPU was also just a marketing term from Nvidia until the industry adopted it.
Yes, and AMD called their cards "VPU" (V=Visual). At lest GPU is better then "SIMD array optimized for graphics tasks".
To be honest I am really annoyed from the fashion naming every single shade of everything with a proper single name... All this is becoming worst than seeing biologies try to do order in that Darwinian orgy called "kingdom of protista" (or whatever classification type it is named this week) .
 
Last edited:
The Volcanic Islands architectures: Tonga and Fiji, do not require playing microcode Tetris to update the functionality, and include being able to context-switch long-running compute wavefronts. Whether that can be brought back is unclear, AMD's patents usually involve some kind of extra hardware to help with this. Polaris would apparently draw from VI.

I don't believe we can practically bring context switching back to CI - as you say there is some specialized hardware involved. It's probably not impossible to come up with a set of compiler/toolchain hacks that would insert code into loops to check for a pre-emption request then run a combination of shader code and driver code to simulate what VI+ hardware does both coming off and going back onto the shader core, but it's really tough to see that as a good use of time.

+1 for no more initialisms/acronyms... I have a tough enough time keeping up with what we have already
 
Back
Top