Can AMD GPUs implement 'hardware' fixed function pipelines through firmware?

Discussion in 'Architecture and Products' started by onQ, Oct 18, 2013.

Tags:
  1. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    Why would that make it fixed function when the HWS is still doing work on programmable shaders, just with a different algorithm. In the scope of fixed functions when it comes to hardware, this doesn't make much sense. If this is fixed function to you then a CPU does fixed function because it allows for changing of microcode too.
     
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,502
    Likes Received:
    10,875
    Location:
    Under my bridge
    The HWS is a fixed-function scheduler who's program cannot be changed on the fly. The GPU and the work it does and the work scheduled is programmable. The entire procesing pipeline, such as performing a framebuffer reprojection, is not fixed function AFAIK because it's just a shader program. That's the area really in dispute here, and I don't think anyone other than OnQ is suggesting that the operation of the GPU on scheduled work constitutes fixed function.
     
  3. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    856
    Likes Received:
    260
    Indeed. A CPU's instruction set are it's fixed functions. And they can be changed (to some degree).
     
    Grall likes this.
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,118
    Likes Received:
    2,860
    Location:
    Well within 3d
    One difference is that the CPU would generally be expected to have free reign (usually) in terms of what it can access in memory for operands and instructions.
    The CUs executing the actual shaders have a general address space that they can use in a Turing-complete manner to perform arbitrary functions.

    I'm not sure how much leeway is given to the microcode engines in the front end, or for other blocks generally considered fixed-function from the standpoint of system use. They can access their microcode store and the specific regions of memory allocated by the system for the queues they monitor, but I'm not sure how much more they can even access even for reading.
     
  5. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    594
    Likes Received:
    298
    But changing them to make more functions that are more complicated from a programmer's perspective doesn't make any sense. If this is the definition, then why even bother specifically asking about AMD GPUs when pretty much all modern silicon can do this, that is why there is drivers and bios in the first place.
     
  6. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    They aren't executing code in the way you think they are. In operation you have to think of them as a hardwired circuit doing one thing and one thing only. No idle transistors, memory banks, etc like you would encounter in a CPU. They will perform in a single cycle what a CPU likely takes many cycles to complete. The concurrency is really the big point here. Theoretically, if large enough, you'd do the equivalent of executing an entire shader in a single clock cycle. Then only be able to execute that one shader.

    All these things are really doing is a bunch of comparisons, likely not even math, and outputting an outcome for another circuit or processor to make a decision. They won't even have math units or anything you would typically associate with a processor, although yes they could emulate the functionality. Arrays of logic gates are a better way to think of these. You likely don't want them touching anything over 8 bits in size.
     
  7. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    856
    Likes Received:
    260
    The instructions are representing an abstract architecture (ISA is InstructionSetArchitecture). This architecture is invented before the hardware architecture. The hardware can implement that in any way they want. Some things have direct representations, but most don't have. So you have instructions which are super fast and one silicon operation, and then you have some which are like 100 silicon instructions long, 1 instruction is really an entire "program". This is all to use the hardware resources in the most efficient way. And being able to actually change the underlying silicon layer from the abstract ISA layer.
    So in the beginning of the silicon architecture you can have only 10 native mappings, and in the 5th generation you have 120 native ones fe. and another 5 of the previously native ones are longer now because nobody used them.
     
  8. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    127
    It is programmable, but the design contract binds it to a specific purpose for encapsulation and security. Taking ACE as an example: a dispatch command from the user land does not need to acknowledge the fact that this unit is as programmable as your Raspberry Pi, but merely "DISPATCH pointer_to_my_shiny_kernel 1920*1080*1". The ACEs parse it, and do its work in the way defined by the microcode, without leaking any details.

    I am not exactly sure about your wordings, but HWS/ACE/GCP does not "do work on programmable shaders" - they know merely a pointer to the shader object code, and some configuration parameters. Particularly, HWS is only responsible of scheduling units of work to the compute units, who are responsible to run the shaders or compute kernels. That's why it is a fixed function. To the external world, it is a self-running black box despite it in reality being perhaps a RISC microprocessor.

    Its programmability is not for general purpose processing, but for maintenance purpose just like your plain old BIOS or secured boot chains in the on-chip security enclaves. They may use a full blown microprocessor, but they only expose an interface to the externals by its design contract.
     
    #48 pTmdfx, Jul 1, 2016
    Last edited: Jul 1, 2016
  9. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    127
    Oh, gonna make a correction.

    Particularly, HWS is conceptually only responsible of scheduling units of work to the compute units, who are responsible to run the shaders or compute kernels. software queues to the ACEs' hardware queue slots. ACEs are then responsible to push work to the CUs (in AQL), which are in the end the guys running the shaders/kernels with a general purpose ISA.
     
  10. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Does this make it clear or do we still play the pretend onQ is crazy game? True Audio has been moved from a DSP to a compute pipeline.

    [​IMG]
     
  11. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,432
    Likes Received:
    261
    Implementing audio using compute is the opposite of fixed function. It's a software library. You don't do a good job explaining what your point is.
     
    ieldra and BRiT like this.
  12. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    You could have always done Audio physics in compute but because of the customized scheduling/fixed function pipeline for the asynchronous compute it can be done with less draw calls & be low latency.


    To be clear you're able to emulate a fixed function co-processor/accelerator using the HWS & compute units.


    See my 1st post in the original thread from 3 years ago:


    TrueAudio being moved from a DSP to compute gives the answer to my question.
     
    #52 onQ, Jul 6, 2016
    Last edited: Jul 6, 2016
  13. Otto Dafe

    Regular

    Joined:
    Aug 11, 2005
    Messages:
    400
    Likes Received:
    59
    Well, with that wrapped up, I propose to lock the thread.
     
    Grall likes this.
  14. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Programmable hardware is never as efficient (perf/watt and/or silicon area) as fixed function hardware. But programmable hardware makes it easier to get full utilization of those transistors as it is not limited to executing single kind of work.

    I wouldn't call ACEs fixed function hardware. GPU front ends are fully programmable processors with programmable memory access. You just don't have access to program it yourself, the driver team does. Traditionally fixed function hardware tends to have fixed data inputs and data outputs. For example texture sampler, ROP (blend, depth test, HiZ), DXT block decompressor, delta color compressor, triangle backface culling, etc. These are highly performance critical parts of the chip, making it a big perf/watt win to use fixed function hardware to implement them. Also hard-wiring reduces latency compared to a programmable pipeline.

    On the other hand, a GPU front end processor only needs to launch a couple of draws/dispatches in a microsecond. That's huge amount of cycles. It is not worth optimizing its throughput at cycle precision, thus programmable hardware makes sense.
     
    ieldra, Heinrich04, Grall and 5 others like this.
  15. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    I also covered that in the original thread.

     
  16. onQ

    onQ
    Veteran

    Joined:
    Mar 4, 2010
    Messages:
    1,540
    Likes Received:
    55
    Not really because now the HWS are starting to be used & ACE's can be upgraded/downgraded (however you look at it) to HWS & Neo/Scorpio is coming.

    Why do I point out Neo/Scorpio? because they will be closed platforms with HWS & extra compute units so they can be used to create virtual co-processor/accelerators.



    With the new information now do people see what I meant about PS4 APU being MIMD? or about a GPU being a DPU?
     
    #56 onQ, Jul 6, 2016
    Last edited: Jul 6, 2016
  17. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,502
    Likes Received:
    10,875
    Location:
    Under my bridge
    I think I do, and once again your language and communication is terrible. If you hadn't used the words' fixed function' (because they don't apply) and instead talked about scheduling efficiency and latency, then you'd have been talking about what's going on here. Although I'm still unconvinced this is exactly what you meant to begin with because it doesn't match your words. There's still no such thing as 'virtual coprocessors/accelerators' except in how you are personally conceptualising what's going on. For the rest of us, there's a workload and a set of programmable processors that can process them, and a means to make compute low-latency.
     
    BRiT likes this.
  18. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,897
    Likes Received:
    2,435
    If this is what you meant by your original question, the answer had always been of course. It should be so obvious it shouldn't be worth asking.
     
    BRiT likes this.
  19. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,171
    Location:
    La-la land
    Not sure what you are trying to illustrate, or what you think that slide illustrates, in relation to your original point.

    So they cut out the DSP and made trueaudio a shader program - so what? It has nothing to do with fixed function-ness, quite the opposite really.
     
    BRiT likes this.
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,715
    Likes Received:
    6,006
    I think I understand where OnQ is going finally with all of this. It's actually quite similar to an article I once read about how GPUS and sound chips would no longer be needed because CPUs would continually scale higher in core count such that these extra cores would take over the usage of this dedicated hardware accelerators.

    But the reality has been the opposite and we have been moving towards more dedicated hardware accelerators and not less for our heavy lifting.

    It's like saying GPUs are excellent at Bitcoin mining, but when compared to a ASIC chip they are no where in the same realm of price/performance.

    What you're effectively asking is whether GPUs can take on more functionality, which they could, ACEs aren't required for that to happen, they may happen to improve the saturation of the hardware but it's not some
    Magic bullet. Your argument would not be much different than implying that if CPUs picked up ACES and all of a sudden we would have GPU like performance running on our CPUs which is false.

    Being able to do a function does not imply replacement or substitution. It implies flexibility at most at the cost of heavy inefficiency.



    Sent from my iPhone using Tapatalk
     
    milk likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...