Can AMD GPUs implement 'hardware' fixed function pipelines through firmware?

Discussion in 'Architecture and Products' started by onQ, Oct 18, 2013.

Tags:
  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Preemption has a non-zero cost in terms of time and the non-workload consumption of resources for bookkeeping and running the special subroutines for moving data and execution context out of the way, and then later ramping it back up. That's injecting a second startup and flush in the middle.
    For the graphics pipeline, the priority queue slides show where the graphics workload slopes down, leaving resources it could be using otherwise idle, but the compute portion cannot start until the last bit of graphics execution is out of the way.
    Context switching for compute isn't as global a switch, but individual wavefronts and kernels need to move data in and out rather than running their own code and can tie up a CU for a while even for unrelated wavefronts on the same CU.
    In either case, if it weren't for time pressure the GPU probably would have waited for a while and filled in slots as they eventually opened up. This assumes there aren't super long-lived wavefronts, or in another scenario malicious ones trying to DoS the system.

    Reserving CUs constrains the GPU from being able to use all its resources for the problem at hand, in favor of keeping them free for a workload that might not need them for a while.

    Prioritization does mean the GPU starts picking winners and losers when it comes to competing for a CU, so the losers will see their wavefront launch rate drop.
     
    mrcorbo likes this.
  2. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    I had a vague sense of this, but didn't really have a complete picture. Thanks again for the detailed explanation.
     
  3. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    Given the responses already given in this thread, it might be helpful for you to state your current understanding of what, specifically, these schedulers enable. Fair?
     
    BRiT likes this.
  4. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    The HSA Architected Queuing Language is a great illustration of the expected role of an ACE. It is true that they are running microcodes, and hence can be reprogrammed. But if you looking close to the hardware, it appears to be more about the ability not to hardwire the command packet parser, so that it is still patchable for whatever reasons (AQL support, and hey - CPU microcode :-]). It is supposedly still bound to the limited set of hardware controls you have, as compared to the graphics front-end. For HWS however, it is likely about the ability to update the scheduling algorithm.

    In the case of VR reprojection, async timewrap or ray-traced audio, these are all built upon the computing capabilities AFAIK. Since what they demand for from the GPU is all in common: prioritisation and QoS, supporting such features by patching ACE/HWS should not be considered making ACE/HWS "emulating fixed function pipelines". They are still general purpose units that are designed to take anything (incl. the stuff you mentioned) thrown at them, just perhaps with some difference in QoS parameters.
     
    #24 pTmdfx, Jul 1, 2016
    Last edited: Jul 1, 2016
    Anarchist4000, milk and BRiT like this.
  5. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    Why are people trying to say programmable hardware somehow makes it fixed function when the are the complete opposite of each other?
     
    Grall likes this.
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Yeah. Async timewarp shouldn't need any hardware patching. All you need is (properly working) high priority compute queues. Same for audio.
     
    Alessio1989 and BRiT like this.
  7. renderstate

    Newcomer

    Joined:
    Apr 24, 2016
    Messages:
    54
    Likes Received:
    51
    In general FF HW is easily more efficient than programmable HW at a specific task it was designed for. Programmable HW can still win if it used to introduce smarter algorithms and/or to address under-provisioned FF HW. The perfect example of this issue is GCN weak geometry pipeline that developers are fixing using compute + improved triangle culling schemes. In fact AMD fixed some of the GCN culling issues in Polaris.
     
  8. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Which is exactly why they use hardware scheduling



    https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved
     
  9. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,601
    Likes Received:
    11,020
    Location:
    Under my bridge
    Which would count as an optimisation, but not creating a fixed-function pipeline, nor emulating a fixed-function pipeline, nor achieving the same performance/efficiency as a fixed-function processor.
     
    BRiT and milk like this.
  10. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,390
    Likes Received:
    8,604
    Location:
    Cleveland
    @onQ please respond to the above, and do so without using out of context snippets/quotes.
     
  11. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,564
    Likes Received:
    1,981
    This thread has a discussion going in it about the subject. Do you want to participate in that discussion or not? If your statements are being presented out of context, provide context. If people are misinterpreting the ideas you're trying to convey based on how you've worded things in the past, word them differently now so that we can properly interpret them. Show that you understand how what you initially posted and what this actually is are different, based on the responses people have given you in this thread. Then we can move forward.
     
    BRiT likes this.
  12. pMax

    Regular Newcomer

    Joined:
    May 14, 2013
    Messages:
    327
    Likes Received:
    22
    Location:
    out of the games
    so I am nowadays just a lazy reader here but... WTF does this thread mean?
    Is it someting like "can AMD reprogram their microcode engine(s) to let them do something weird and different via a software update"?
    Or something more like "can AMD reprogram their additional components with maybe FF stuff to behave differently -i.e. add x.265 to cards that supports only x.264"?
    Or anything else???
     
  13. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Because they are identical in operation with the only difference being number of transistors to implement the circuit. Not sure why you'd think they are the complete opposite of each other. For all intents and purposes they are hardwired circuits with concurrent execution occurring in a single clock cycle. They just have the ability to be modified at device initialization.

    Possible, but the hardware requirements to do something that complex likely wouldn't be worth it. The decoding likely stays the same, beyond the new formats. There may be changes to the bitrate or new constraints where it would not fit in the hardware. In most cases they are used for industrial system controls where each application is different. An engineer could then reprogram them to suit his needs without having to fabricate a chip for every situation. Realtime scheduling of GPUs is another use for them.
     
  14. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    Not sure you know what fixed function means.
     
  15. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Unable to be changed, performing one specific task. Just like these things are doing. Unless you think flashing a BIOS a billion times a second is programmable? For all intents and purposes these things become hardwired circuits once the device is initialized. They won't be bouncing between different capabilities at runtime.
     
  16. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    What makes you think these things require a bios flash? They are just hardware that are multipurpose that are abstracted by a driver layer.
     
  17. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    They won't require a bios flash, but that's a rough equivalent. The microcode could be substituted the same way CPUs and many other devices do when initialized. In most cases I've seen that occurs when booting the operating system. Point being it likely won't be occurring during operation or frequently. Any change would likely require resetting the device.
     
  18. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    Why would you need to change the microcode to do any of the functions you'd want to do with this? The hardware isn't fixed function, so just use it as such. Or do you want to partition half the gpu to do pixel shading and half to do vertex shading for no reason?
     
  19. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,601
    Likes Received:
    11,020
    Location:
    Under my bridge
    Anarchist4000 is right. Fixed function can still be programmed, like an FPGA. Once established, it functions as a static processor.

    The issue with the AMD concept is that the processing of the workload, happening in the shaders, is programmable. It's the HWS that's 'fixed function' although programmable via update. The initial posit was/is unclear about what exactly is meant by 'fixed function', with the implication definitely being that the workload processing (graphics tasks) are operating as if on fixed-function specialist processors, and presently OnQ has declined to clarify his position.
     
    milk likes this.
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,601
    Likes Received:
    11,020
    Location:
    Under my bridge
    Again, Anarchist4000 is talking about the HWS being 'flashable' yet fixed function. You're talking about the GPU shaders being programmable.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...