AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    546
    Likes Received:
    182
    If the patents we have seen are any indication, this will definitely not happen, as it appears AMD is going with an approach where the texture units are modified to also sort of act as the RT units.
     
    w0lfram likes this.
  2. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    351
    Likes Received:
    96
    IMO a good idea, and probably true as at least one major vendor (UNITY) is already publicly experimenting with custom BVHs (bounding tetahedrons) so programmability seems likely.

    Besides, raytracing is often throughput bound, the amount of bandwidth you'd need is still crazy.

    For GPU chiplets I can see maybe a severely limited near future where some low bandwidth, low power common areas of GPUs are done in a chiplet. Do all the external output areas that would be on die (leading out to HDMI, displayport) and associated need to be built into the monolithic die? Maybe not, but I'm not sure how much cost saving, if any, that would even net you, you'd still need to transfer the final output over some interconnect after all. I want Navi to be a step towards chiplets, but I seriously doubt it is.
     
    #1362 Frenetic Pony, Aug 15, 2019
    Last edited: Aug 17, 2019
    w0lfram likes this.
  3. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    569
    Likes Received:
    657
    As far as i understand the patent, the BVH format is hardwired, only the traversal can be eventually programmable to some degree?
    Not sure if this leak has been discussed here: https://pastebin.com/y8qXme7b
    If true, the patent would not apply for XBox and they use one RT core per CU, maybe similar to NV.
     
  4. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    159
    Likes Received:
    33

    Goes back to what does AMD mean by RDNA being scalable. (How wide?)

    I also suspect, the reason Navi10 is locked down on bios is because they don't want you to know how 5700 operates @ 800MHz, or downclocked in a laptop. We know how Vega scaled up & down, but not RDNA.
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    AMD doesn't comment much about the command processor or the related controllers that make up many parts of the control logic of the GPU.
    Some Linkedin posts and the PS4 hack discuss how the command processor and ACEs are custom "F32" microprocessors, which are designed with a straightforward set of operations designed to load command packets and reference them with a loaded microcode store, and then performing the defined actions or setting hardware state, interrupts, or internal signals.
    This discusses the multiple processors that make up the "command processor" for the PS4 https://fail0verflow.com/media/33c3-slides/#/74.
    Radeon driver notes discuss how the command processor contains a multiple internal processors, and other mentions indicate the ACEs also have a history of using F32 cores.
    GFX10 mentions yet another microcontroller in the command processor, an MES which appears to control the scheduling of the command processor's main elements like the ME and PFP.

    I've tried searching for older slides to confirm how far back this architecture goes, with limited success. There's vague allusions to the custom command processor as far back as the VLIW days with Cypress.
    There might have been a reference to a custom RISC-like core for graphics chips even prior to that, but I cannot find it now.

    There's nothing theoretically preventing multiple controllers from cooperating, if the architecture makes room for it. AMD hasn't discussed such a use case. Descriptions of the behavior of these cores involves multiple front ends, with some of them arranged in a hierarchy, and they all talk to their local hardware or each other with internal paths and hidden state that don't make it outside of their little domains or the same device. They all manage segments of a big black box of graphics state with many parts that don't migrate outside of the chip.
    If AMD wanted to create an SMP-capable form of an architecture that appears to predate GCN, some/all of Terascale, and even somewhat coherent GPU memory, I suppose it could invest in doing so.
    The last time something like this was sort of asked, AMD opted for explicit multi-GPU, which is closer to saying "treat each front end as an isolated stupid slave device and manage with the API".
    Since the latest leadership took over, AMD's stance is that multi-GPU won't happen unless they can make the GPUs appear as a single unit--but with no mention of how they intend to do so or if they are seriously evaluating it at present.

    What choices they'd make for the architecture, and what sort of problem space is hidden in the GFX space that AMD doesn't talk about is unknown at this point.


    Many of these techniques want that prior frame data. Making it invisible is prompting them to error out or pull in garbage data. If the hardware sits on a barrier or lock until the data is ready, it's back to the problem of heavy synchronization and spin-up latency that currently exists. Also, at some point these areas need to be made writable in order to fill them.
    This also leaves unexplained how the properties of these regions are defined with properties like "read-only" and how ownership is handled. A popular way is to use page table attributes, but this is not trivial. Updates to make something writable or read-only with TLBs in play are a serious pain for OS writers and CPUs already. TLBs being a system-critical resource that the usually coherent x86 CPU domain does not treat as coherent.
     
    w0lfram likes this.
  6. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    60
    Likes Received:
    108
    Location:
    Toronto-ish
    Rage 128 was the first GPU to include the Command Processor, which supported the move from what we called Programming Model 3 and Programming Model 4 (the pm4 packets you see mentioned in driver code get their name from the new HW programming model). I don't remember if PFP (pre-fetch parser) was there from the start; believe it was added later.

    If you ignore PFP and Constant Engine (CE) all the parts had a single micro-engine up to GFX6/SI; in GFX6 the same ME handled 1 GFX/compute ring and 2 compute-only rings. Starting with GFX7 we added one or two more engines dedicated to handling compute rings (so MEC for Micro-Engine Compute). Each MEC supported 4 compute pipelines, and each compute pipeline multiplexed between up to 8 queues. I believe one of those compute pipelines is what we call an ACE in marketing materials.

    On GFX7 and up one of the compute pipes can be used to run a HW scheduler rather than a compute pipeline, which allows the use of a much larger number of compute queues by switching process/queue sets onto the remaining compute queues. You can see some of the code for this in amdkfd/kfd_device_queue_manager.c.

    Not sure how much we are saying about Navi yet so I'll stick with history for now.
     
    Pete, TheAlSpark, PizzaKoma and 15 others like this.
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    Thank you for the historical context. Perhaps this portion of the GPU is too unglamorous to warrant more than the simple boxes in the front-end diagrams, or there's not much reason to dive into the details of an architecture whose job is to set the stage for more visible parts of the GPU.
    It does seem like these processors and their bespoke architectures have a persistent place across publicly heralded architecture shifts, and a presence/influence in some platforms like HSA.

    The difference between the external marketing and what happens underneath is interesting.
    It sounds like an ME or MEC can serve as a bound on the number of microcode payloads that can be loaded or simultaneously active, but then the individual pipes sound like they're the independent elements despite being lumped into an engine.

    It seems like this kind of front end organization provides a decent amount of flexibility, at least in scenarios where the throughput needs or compute intensity don't make something like programmable gate arrays or more conventional cores necessary.
     
  8. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    60
    Likes Received:
    108
    Location:
    Toronto-ish
    Correct... each pipe has its own fixed-function hardware. The microcode mostly does PM4/AQL packet decoding.
     
  9. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    184
    Likes Received:
    152
    So now we know what the "scalability" stands for. Also smartphone usage (by Samsung?) is confirmed. It's funny how fast they dismissed Vega as being a "bruteforce" hog :)
     
    DavidGraham likes this.
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,855
    Likes Received:
    2,685
    So the main differentiators from GCN5/Vega are:

    -the increased cache bandwidth to ALUs (minimizing their idle state and working them harder)
    -the massive increase in triangle culling rate.

    RDNA still rasterizes only 4 triangles per clock though. And there is no mention of Primitive Shaders or the DSRB anywhere.
     
  11. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,572
    Likes Received:
    720
    There are the smaller 32 wave fronts as well, increasing parallelism?
     
  12. anexanhume

    Veteran Regular

    Joined:
    Dec 5, 2011
    Messages:
    1,574
    Likes Received:
    765
    Some Navi variants will have enhanced INT capabilities. (This is from the white paper)

     
    #1373 anexanhume, Aug 21, 2019
    Last edited: Aug 21, 2019
    Lightman and BRiT like this.
  13. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    184
    Likes Received:
    152
    Aren't these the ops revealed in Navi LLVM code?
     
  14. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    524
    Likes Received:
    241
    Yes.
     
  15. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    351
    Likes Received:
    96
    Along with the much larger cache structure, the fact that primitive shaders seem to be there and work and have their own block, etc. etc.

    There are a good amount of changes, that per transistor (the more relevant metric) makes Navi solidly more efficient than Vega when it comes specifically to gaming. Not terribly efficient, but at least a definitive improvement. I do wonder what AMD's costs per GPU are...
     
  16. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    227
    Likes Received:
    99
  17. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,206
    Likes Received:
    600
    Location:
    France
    Well Vega had squat in the end. Seems it was way too ambitious to exploit in the real world.

    If Navi can do less on paper but really do it in real world, it's progress imo.
     
  18. Shaklee3

    Newcomer

    Joined:
    Apr 9, 2016
    Messages:
    18
    Likes Received:
    10
    Speaking of Vega, did they ever release the Instinct version of the Vega? They announced the MI50/60 in November of last year, published a questionable benchmark a few months ago, but it's nowhere to be found. You can't buy one if you wanted to as far as I can tell. It's fairly deceiving given that their website has had info on it for a while.
     
  19. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    524
    Likes Received:
    241
    They shipped it to hyperscale and you don't look like one so you're not supposed to see it.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...