Can AMD GPUs implement 'hardware' fixed function pipelines through firmware?

Discussion in 'Architecture and Products' started by onQ, Oct 18, 2013.

Tags:
  1. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Mod: This is a thread started in the console space in 2013. A recent new post led to a new conversation. I think it's worth posting here so the AMD experts can explain what their hardware can and can't do. I will copy over some of the arguments from PM.

    I know this sound silly but it seems like it's exactly what Sony is planning to do with the 8 ACE's.

    It's a few things that I have read over the last year or so that's leading me to believe this is what they are doing I'll try to go back & find all the quotes later but for now I have a question.

    If Sony was to config the 64 command queues to make the pipelines emulate real fixed function pipelines could they work just as efficient as real fixed function hardware?

    Update June 2016:

    PS4 has Hardware Schedulers & they are the reconfigable processors that has been talked about even before the PS4 was revealed , this is the reason why re-projection for VR can be done with little effect on the GPU because they are able to re-config the HWS to run reprojection on the GPU as if it was a processor made for reprojection by controlling how it run on the GPU. Look back at the quotes I put in bold from Cerny in this thread.

    [​IMG]

    For normal code it's fixed function but the functions can be changed by low level microcode that's not something that game devs will be able to do freely it will be done by Sony for adding functions like reprojection & Raytraced audio.

    http://www.tomshardware.com/reviews/amd-radeon-rx-480-polaris-10,4616.html
     
    #1 onQ, Oct 18, 2013
    Last edited by a moderator: Jun 30, 2016
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,598
    Likes Received:
    11,004
    Location:
    Under my bridge
    I am of the opinion that the HWS can only schedule work and cannot change the function of the hardware. All it can do is increase efficiency of utilisation for given tasks. OnQ is saying the above -
    Is there any AMD engineer who'd like to set one of us straight?
     
  3. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
  4. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Dave is in the quotes I sent you saying the same thing I said


    My Words

    "Could Sony create fixed function pipelines for the PS4 even after release?

    It's a few things that I have read over the last year or so that's leading me to believe this is what they are doing I'll try to go back & find all the quotes later but for now I have a question.
    If Sony was to config the 64 command queues to make the pipelines emulate real fixed function pipelines could they work just as efficient as real fixed function hardware?"



    Dave Words
    "The HWS (Hardware Workgroup/Wavefront Schedulers) are essentially ACE pipelines that are configured without dispatch controllers.

    My words
    "
    By creating the fixed function pipelines at the driver level once you figure out just what fixed functions you want the pipelines to be used for"

    Dave Words

    "They are microcode-programmable processors that can implement a variety of scheduling policies. We used them to implement the Quick Response Queue and CU Reservation features in Polaris, and we were able to port those changes to third-generation GCN products with driver updates.
    "
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Assuming that Sony has updated the PS4 to have HWS microcode loaded, which is a point I will be addressing later:
    He's telling you that the HWS are ACE pipelines that are configured without dispatch controllers. I think in this context that means they do not perform actual command read and kernel launch themselves. They instead perform higher-level shifting around of the queues and runlists for the rest of the front ends.
    I think he's trying to tell you that for the purposes of emulating a function (fixed or not), they don't do any of said "function".

    He's stating that they are processors that can help determine where a generic queue command will be read, when it will be read, and when it will launch.
    Also, by saying those features can be back-ported to GCN3, he's not addressing GCN2 of the consoles. Perhaps it can go back that far, although that would be up to Sony to decide and disclose.
     
    #5 3dilettante, Jun 30, 2016
    Last edited: Jun 30, 2016
  6. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55

    Cerny already said that it was done in hardware.


    ""Once we have this vision of asynchronous compute in the middle of the console lifecycle, the question then becomes, 'How do we create hardware to support it?'"

    "This concept grew out of the software Sony created, called SPURS, to help programmers juggle tasks on the CELL's SPUs -- but on the PS4, it's being accomplished in hardware.

    The team, to put it mildly, had to think ahead. "The time frame when we were designing these features was 2009, 2010. And the timeframe in which people will use these features fully is 2015? 2017?" said Cerny.

    "Our overall approach was to put in a very large number of controls about how to mix compute and graphics, and let the development community figure out which ones they want to use when they get around to the point where they're doing a lot of asynchronous compute."
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The "it" you are using and the concepts being discussed by the two individuals may not be the same.
    HWS in particular is a more recently finalized and validated method that needed a new microcode version that could be loaded by the subset of GPUs capable of taking it via driver update. Sony writes its own software, so if or when is up to them.

    The methods Cerny discussed have various vaguely defined features that might align with what HWS became, although it's not exclusive of the non-HWS method that existed for years.
    That Cerny went on about 64 queues, which is something that fully-featured HWS can exceed, hints that something as complex HWS was not in the hardware (and technically would be in the microcode rather than actual hardware).
    Similarly, Sony's audio engineer discussed the use of the GPU as an HSA audio device back then, and it was clear at the time that there were very serious objections to using it for anything remotely latency sensitive. Additional developer disclosures for launch titles indicated rough edges that HWS and the newest management methods we have now would have handled readily. The foundational elements could have been there from the beginning, but it seems like the actual system might not have fully come together for years and so would not exist in the PS4 until it is put there--if it can.
     
  8. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,849
    Likes Received:
    2,267
    How does the question "Can AMD GPUs implement fixed function pipelines?"
    differ from
    "Can AMD GPUs run code written for fixed function pipelines?"
     
  9. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    It's a weird title... My first reaction was: "of course it can run DX7, what's this all about?"
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    An important point of clarification we would need would be coming to a shared definition of "fixed-function".

    edit: and probably "implement" and "pipeline"
    edit edit: and "efficiently"
     
    Alexko, Razor1 and AlBran like this.
  11. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    I think this is bit clarifies what OnQ is speculating.

    "PS4 has Hardware Schedulers & they are the reconfigable processors that has been talked about even before the PS4 was revealed , this is the reason why re-projection for VR can be done with little effect on the GPU because they are able to re-config the HWS to run reprojection on the GPU as if it was a processor made for reprojection by controlling how it run on the GPU."
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    In that case, the HWS is not being reconfigured. Its job is to take a set of commands related to what programs need to be scheduled and how important they are, then tells another front end whether it needs to start looking at that set or part of it.
    That ACE will then evaluate various commands as it comes to them, many of which will involve dispatching wavefronts for a program that is pointed to by the command.

    That program will be composed of standard instructions that the HWS never sees or cares about, and that the ACE does not care about.

    It might happen to be a reprojection kernel.

    It will probably have a minor impact because it is not a massively intensive operation relative to everything else the GPU is doing.

    And versus some hypothetical dedicated hardware device for reprojection--it would in many ways not be as efficient.
     
  13. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,384
    Likes Received:
    8,602
    Location:
    Cleveland
    Also need to define what "can" means. Does it mean theoretically or does it mean "should and will" in the premise that Sony (and NOT Amd) will do so.

    The original premise is Sony will do such and such. The pedantic reality, and the only one that matters, is that it would never be Sony doing so, it would be AMD.
     
  14. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    My layman's interpretation would be that the practical effect of these capabilities is that a GPU that was being sent multiple workloads would be able to perform a specific task or class of tasks from the total available work nearly as well or as well as a GPU which was being tasked with *nothing* but that task or class of tasks. Developers have the choice to either give certain tasks the ability to preempt other lower-priority tasks, giving them the ability to potentially fully take over the available processing resources or they would dedicate a certain portion of the available processing resources to a single task or class of tasks.

    Now how much of that did I get wrong?
     
  15. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Last line!

    [​IMG]
     
  16. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    My title from 3 years ago was "Could Sony create fixed function pipelines for the PS4 even after release?"
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The concurrent scheduling and dispatch prior to all of this was about allowing for handling of multiple workloads and increasing utilization of spare resources. The resources are abstracted away from any given task, and the GPU is generally self managing internally.
    HWS has functions relating to mapping those tasks to a front end, while allowing for quality of service between tasks and different virtual memory spaces.
    The older methods with limited prioritization, no preemption, or reserved resources meant that some workload types simply could not accept that the GPU would not get to them until it decided to, regardless of time-sensitivity. With them, the GPU is no longer completely unacceptable or as vulnerable to an OS timeout, but individual tasks are going to perceive some amount of degradation as a result.

    That's a slide specifically focused on the HWS unit, in the context of coordinating, scheduling, and prioritization.
    Physically, it is a set of custom or semi-custom processors that run a compact program or set of programs whose job is to coordinate, schedule, and prioritize tasks. They aren't going to load a reprojecting shader into their limited local memory and run it. They do not share the functional capabilities or ISA needed to do so. Anyone can write such a shader, and there are plenty of architectures that do so without HWS. A game can write a shader and ask the GPU to run it at a given priority with some possibly pre-allocated resources.
    If there is a modification to the functionality that HWS provides, it is probably new ways for them to coordinate, schedule, and prioritize tasks whose actual nature they do not know so that the ACEs can pass those tasks (that they also do not really know or care about the actual nature of) for dispatch to the CUs.
     
    mrcorbo likes this.
  18. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    Thanks. Actually, the first phrase that came to my mind when trying to think of a way to describe this was "QoS mechanism". Would you mind going into the bolded a little more?
     
    #18 mrcorbo, Jun 30, 2016
    Last edited: Jun 30, 2016
  19. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,598
    Likes Received:
    11,004
    Location:
    Under my bridge
    The clarifications on terms comes from the posit, that Sony is using the HWS to enable...
    Therefore, fixed function means a custom ASIC purposefully created for the task of reprojection (or audio raytracing).
     
    BRiT likes this.
  20. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    ACEs, HWS, etc are programmable logic devices. So they get initialized as a specific circuit, effectively fixed function. Their program is effectively concurrent in execution. Since GCN1.2 they all appear roughly identical in capabilities with some combination of devices: 2ACE=1HWS=something for HSA=etc.
     
    onQ likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...