AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,941
    Location:
    Well within 3d
    The transistor ratio per TF disparity doesn't seem too crippling, particularly since there would have been choices made for the consumer Navi's target market that could have been adjusted to favor compute if it were similarly dedicated. That aside there could be other issues like the risk of involving Navi in products aimed at the datacenter and HPC clients.
    Navi at this time has some bugs that would make it less compelling for compute in particular (memory addressing mode bugs, LDS bugs, etc.), which may have contributed along with AMD's limited software support for why Navi's compute performance and implementation have been poor or subject to some significant errors.

    Driver changes for Arcturus may point to some significant changes, as in some form of vector unit that is architecturally distinct from the existing SIMDs, potentially targeting large vectors/matrices with multiple levels of precision/accumulation.
    There were also possible placeholders or existing graphics elements mentioned even if the graphics command processor was specifically missing. It could be that some amount of geometry and pixel capability remains, or it was less disruptive to the architecture or driver base to leave them as-is than to totally remove them.
     
    Malo likes this.
  2. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,336
    Likes Received:
    297
    Yes, but the discussion wasn't about some kind of abstract comparision, it was about two particular architectures, two particular GPUs. Real-world compute performance per transistor of Navi 10 is often worse than real-world compute performance per transistor of Vega 20, too. Real-world results are even worse for Navi than the theoretical comparision I was talking about. So… I'm not sure what is the point of your reply. Both Navi's theoretical compute perfromance per transistor and real world comute performance per transistor are worse than Vega's. That's the reason why AMD decided to split development of gaming and compute architectures. If they plan to use Navi/RDNA for compute in future (and Vega-derived Arcturus was just a short-term solution), they wouln't make entire roadmap for a separate architecture. Because in that case there would not be a separate architecture. It would still be RDNA/Navi just like the gaming one. But it isn't.
     
    xpea likes this.
  3. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    151
    Likes Received:
    241
    CDNA looks like a byproduct of the supercomputer wins. I mean i dont think that AMD will sell products for $300 to support their DL/AI market share...
     
    w0lfram, pharma and xpea like this.
  4. Qesa

    Joined:
    Feb 23, 2020
    Messages:
    4
    Likes Received:
    6
    Given many OpenCL applications also fail to start or give incorrect results, I suspect the performance has more to do with the state of the drivers than the hardware.
     
  5. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38
    Why would you even say that...?

    Dr Su herself told us AMD's divergence on this point and why they sequestered the rdna team into silence while developing it. Because she is a Gamer, herself and wanted to develop game only architecture.

    Now that AMD has spent those resources and molded an entire gaming industry behind it (rdna2) even before any of us gets to see it. We all got a taste of rdna1, but that was a hybrid design, not full uArch. So that now Microsoft, Sony, Samsung & Google have all bought into what rdna2 can do...

    You think AMD will fully switch away from what they have been working towards..? Two different dGPU's in two different feilds geared/engineered for efficiency with no cross-market inefficiencies...

    [​IMG]

    That grey area is one architecture for everything = nvidia

    Rdna & Cdna are not general purpose, they are specific to their field....
     
  6. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,018
    Likes Received:
    114
    The conclusion makes sense to me too. We know that CDNA in the form of Arcturus is basically GCN (Vega - though I guess the G in the name would be a bit inappropriate here...) with some graphic bits stripped off. So CDNA2 could really be anything, and imho it makes a whole lot of sense if this would be really the same as some rdna version (unless they'd actually stick to GCN even). Despite the flashy diagram, I don't expect amd to develop really two completely separate architectures. Separate chips yes (although I have to say I am still somewhat sceptical about the viability of even this approach, but apparently amd is willing to go there), but there's no real evidence it's really going to be a separate architecture other than in marketing name.
     
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    Because (in case you're interested in a serious answer). Vega 20 has 43,4 edit: 43,2 TFLOPS of compute per mm², while Navi 10 has 40,4 - both in their fastest incarnations. And that is with Vega 20's insanely wide memory controllers and half-rate DP, neither of which are free in terms of die space.

    And with Vega, compute applications work, whereas Navi still has issues. You don't want your next Supercomputer installation with a 100k cards choke on the first day.
     
  8. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38
    So AMD's slide is wrong and they will not be utilizing two different graphic architectures....?

    And you are trying to say that with a strait face, even though AMD's CEO was on stage telling us otherwise, just a week ago. Then went into the reasoning behind why they are doing it. Because it allows AMD to leverage each architecture, to fully benefit the customer. Who are on two different ends of the spectrum, that one uarch can't make happy. The reasoning is pretty simple, so perhaps you didn't understand, or don't care, or just dismissing it..?

    rdna = gaming optimized uarch
    cdna = compute optimized uarch


    Really not that hard to understand.
     
  9. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    You ever heard of the word „rebrand“?
     
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    Fixed that for you.
     
    CeeGee and xpea like this.
  11. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,489
    Likes Received:
    233
    Location:
    msk.ru/spb.ru
    I dunno if I'd call RDNA a "GCN fork" really. This implies that the other branch will still be alive for a long time, and I don't see why this would be the case, especially with the alleged perf/watt improvements of RDNA2. The latter will likely destroy GCN in compute workloads just like RDNA1 destroys it in gaming. Perf/watt is very important for HPC space.
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    Who's to say many of the same improvements can't be applied to GCN-base too? I mean, we have no clue whats being improved and how
    (edit: also, it helps that in certain contextes RDNA is referred as GCN1.5 and RDNA+DLOps 1.5.1, while for example Vega is 1.4 and Vega20 1.4.1)
     
    #5992 Kaotik, Mar 9, 2020
    Last edited: Mar 9, 2020
  13. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38
    So you are saying rdna is not a new uArch. So therefore, rdna2 can't be...


    ed: Why do you think rdna2 is going to have gcn in it, when we were told exactly what rdna doesn't have, gcn.
     
  14. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,489
    Likes Received:
    233
    Location:
    msk.ru/spb.ru
    That's hardly relevant to the underlying h/w though. CUDA have a straight 1 to whatever it is now "Compute Capability" metric for example which goes as far back as to G80/Tesla. This doesn't mean that Turing is a fork of G80.
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,941
    Location:
    Well within 3d
    Some elements that seem likely to benefit CDNA that showed up with Navi are the doubled L0(RDNA)/L1(GCN) bandwidth and an apparently more generous allocation for the scalar register file. The RDNA L1 cache is read-only and it may be that compute loads with a lot of write traffic might be outside its optimum, but on the other hand I'm not sure what Arcturus is doing with subdividing the GPU's broader resources. The larger number of CUs and the lack of a 3d graphics ring might point to it acting more like a set of semi-independent shader engines managed by a subset of the ACEs, and that sort of subdivision might still align with what Navi did with the hierarchy.
    The longer cache lines could be a wrinkle in memory coherence, since RDNA's cache granularity is now out of step with the CPU hierarchy. It can be handled with a little bit of extra tracking, however.
    How the WGP arrangement may help or hinder (outside of bugs) may need further vetting. It seems like WGP mode can help heavier shader types, but per the RDNA whitepaper there are some tradeoffs like shared request queues that might be less helpful in compute.

    Some elements like the current formulation Wave64 may not be full replacements for native 64-wide wavefronts, as there are some restrictions in instances where the execution mask is all-zero, which is a failure case for RDNA. RDNA loses some of the skip modes that GCN has, drops some of the cross-lane options, and drops some branching instructions. On the other hand, it does have some optimizations for skipping some instructions automatically if they are predicated off.
    CDNA's emphasis on compute, and Arcturus potentially having much more evolved matrix instructions and hardware, could make the case for a different kind of Wave64, or a switch in emphasis where it's preferred to keep the prior architectural width.
    Usually HPC is less concerned with backwards compatibility, but perhaps AMD's tools or existing code may still tend towards the old style?
    Physically, the clock speed emphasis may be partially blunted. Arcturus-related code commits seems to be giving up some of the opportunistic up-clocking graphics products use, with the argument being the broader compute hardware would wind up throttling anyway. If the upper clock range is less likely to be used, perhaps the implementation choices would emphasis leakage and density versus expending transistors and pipeline stages on the "multi-GHz" range AMD seems to be claiming for RDNA2.

    On top of all that, there seem to be errata for Navi that may be particularly noticeable for compute and might have delayed any RDNA-like introduction into the development pipeline for HPC.

    I'm curious what AMD's managed to cull from Arcturus. For example, the driver changes make note of not having a 3d engine, but there are still references to setting up values for the geometry front ends and primitive FIFOs, for example. Also unclear is what that means for the command processor, since besides graphics it is usually the device that the system uses to set up and manage the overall GPU. Losing it doesn't seem to gain much other than a little rectangle in the middle of the chip, for example. If it is gone or somehow re-engineered, perhaps it has more to do with some limitation in interfacing with a much larger number of CUs rather than the area cost of a microcontroller.

    One item of note with regards to Vega 20's wide memory controllers is that while they are wide, Navi 10's GDDR6 controllers are physically large. From rough pixel counting from the die shots for both on Fritzchens Fritz for the two, Navi's memory sections have an area in the same range as Vega 20's, which would have a corresponding impact on the FLOPS/mm2. At least my initial attempts at measuring seem to indicate Navi 10's is noticeably larger. Vega 20 is a larger chip, which usually means the overhead of miscellaneous blocks and IO tends to be lower versus what smaller dies must contend with.

    If the references to a new architectural register type and matrix hardware are what they seem to me, Arcturus is going to have a large rise in FLOPS/mm2, with the impact dependent on the precision choices and granularity chosen. That wouldn't be an apples to apples comparison, though.

    One thing I did find recently is some discussion on certain issues that code generation has for GPUs for mesa code, in which there are some additional details about some of the bug flags for RDNA.
    https://gitlab.freedesktop.org/mesa...18f4a3c8abc86814143bf/src/amd/compiler/README
    It's not just hardware bugs (includes some unflattering documentation issues) and not just RDNA, but RDNA has a list of hardware problems. I think some of those would be more objectionable for compute, and maybe one reason why the consoles seem to have gone for for a more fully-baked RDNA2.

    (edit: fixed some grammar)
     
    #5995 3dilettante, Mar 9, 2020
    Last edited: Mar 9, 2020
    CarstenS and no-X like this.
  16. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,018
    Likes Received:
    114
    Not saying the slide is wrong, just does not quite tell the full truth perhaps.
    From what I can tell from the open source drivers, CDNA really is GCN (Vega). Yes there might be some tweaks underneath here and there, but it seems like a stretch to claim this is really a different architecture.
    Of course in the future, it could diverge more between graphics and compute products, I'm just not convinced this makes sense, hence IMHO it is more likely cdna will remain a very close relative of an existing graphics architecture. This does not really contradict anything which was said or presented in the slides.
     
    xpea likes this.
  17. Leovinus

    Newcomer

    Joined:
    May 31, 2019
    Messages:
    113
    Likes Received:
    48
    Location:
    Sweden
    Without hoping to derail too much. For a layman, how are adapting for architectural quirks like this done practically? Are they dealt with insomuch as the compiler is updated to handle it and developers being asked to keep certain code and asset restrictions in mind, or is it mainly for engine coders to tailor for specific behaviour?
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,941
    Location:
    Well within 3d
    Some of the public information on the bugs or issues come from compiler commits, such as for LLVM or Mesa, so some amount of compiler adaptation is occurring.
    The effectiveness of compilers or how often some of the dropped features found use is something I don't know.
    VSKIP was discussed by AMD in the past as getting compiler support, but that assembly was also given as a viable option.

    Wave64 versus Wave32 has been discussed primarily as a compiler choice, based on some evolving set of heuristics. I presume the heuristics would pay attention to bug flags and the properties of the code being evaluated as to whether it should go for one mode or the other, perhaps being conservative if its analysis is incomplete. Or if such changes are not handled well, it may be corroborated with Navi's woes with compute workloads.

    Changes like this are part of why assembly can be harder to justify except in fields where performance is paramount and there's already an assumption of significant code optimization. What first comes to mind is HPC, although even then much of it isn't going to go to that extent and AMD is still working to make up the software deficit.
     
    Leovinus likes this.
  19. Radolov

    Newcomer

    Joined:
    Jul 30, 2019
    Messages:
    11
    Likes Received:
    13
    There was a patch on Arcturus talking about something new called ”AccVGPRs”. Previously there has been mentions of AGPRs , but to my knowledge it has never been clarified what the “A” stood for. Is it safe to assume that it stands for “Accelerator”?

    New update
    https://lists.freedesktop.org/archives/amd-gfx/2020-March/047222.html
    Some previous mentions of AGPRs
    https://github.com/llvm-mirror/llvm/commit/6644a1885fccc43708cf4486b7f31a9168826ca4
    https://github.com/llvm-mirror/llvm/commit/cb57db03360f8247a475e77dc895f7adb573c0b1
     
    Lightman likes this.
  20. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    accvgprs were mentioned before, for example in the first of your additional links. In fact, they were mentioned as far back as july 2019.
     
    #6000 CarstenS, Mar 14, 2020
    Last edited: Mar 14, 2020
    Radolov likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...