AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,815
    Location:
    Well within 3d
    As far as the JEDEC standard is concerned, there isn't an HBM/HBM2 split as much as there is a more final and generally acceptable revision to an earlier niche implementation.
    If it were Polaris and linked to a Gen 9 or later IGP, there would be a notable discontinuity in terms of DX12 feature-level support. Vega would be the first to reach a similar level of completeness, although it's not clear how often a system would leverage the two units in a way where it would matter.
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,113
    Likes Received:
    1,809
    Location:
    Finland
    We already know that AMD does do "mix'n'matchin" as needed, though, as PS4 Pro's APU incorporates parts of Vega to what looks like a Polaris-backbone
     
  3. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    217
    Likes Received:
    95
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,815
    Location:
    Well within 3d
    Given its apparent binary compatibility with the Sea Islands-based PS4, the Pro might have had some constraints on what it could adopt from Vega.
    If trying to match the broad DX12 compliance of Gen 9+, conservative rasterization, raster-order views, and FP16 minimum precision are what show as the three notable differences with everything <GCN5. (Still a virtual addressing difference, unclear how important it would be in this case.)

    Conservative raster is presumably built into the fixed-function hardware, and there are indications with the Vega ISA doc that raster-order handling is integrated into the ISA with its Primitive Order Pixel Shading hooks. For whatever reason, DX12 doesn't consider Polaris' FP16 treatment equivalent to Gen 9.

    The first two features seem to pull in the guts of Vega, with FP16 potentially also having some kind of refinement that Vega has over prior versions.


    It's an introduction mostly to the idea of primitive culling for the GPU. Triangles can often be out of view, facing the wrong direction, or too small to be rendered. Cutting those triangles out as soon as possible can help with efficiency and performance, such as by not polluting on-die caches with excess vertex data.
    AMD claims that their GPUs can wind up discarding half of the triangles submitted to them, and that the geometry front end could avoid losing a cycle per culled primitive and avoid thrashing its vertex parameter cache.

    How much of that half of submitted triangles isn't already compensated for in the hardware, or how much that fraction shows up in overall performance for Vega and pre-Vega GPUs is not clear.
     
    el etro and Digidi like this.
  5. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    315
    Likes Received:
    81
    Though we can take a good guess from the Wolfenstein 2 optional (software) GPU culling, which is faster on AMD including Vega but slower on Nvidia, and say "not enough". Vega was/is supposed to have improved culling in the form of "Primitive shaders" but the whole thing, while supposedly on Vega, doesn't work. Whether it could work in future and there's just trouble implementing it at the driver level, entirely unsurprising if true, or if the feature is somehow faulty on Vega is unknown by anyone outside AMD at the moment.
     
  6. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    492
    Likes Received:
    213
    They've provided some (albeit vague) numbers for NGG in whitepaper. It probably works, at least to some extent.
     
  7. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,703
    Likes Received:
    2,431
    pharma likes this.
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    The feature is called "Primitive Shader", implying that it is fully programmable shader based culling system. It is not fixed function hardware. If I understood properly, the developer needs to write these primitive shaders to improve the culling rate. So far no graphics API exposes this feature. There's also discussion about the possibility of AMD autogenerating these primitive shaders in the future drivers. Without full spec of the system, I can't really say whether this is possible in the general case, and how hard problem it is to solve.
    GPU culling was originally designed to avoid GCN2 hardware bottlenecks (consoles). GCN3, GCN4 and GCN5 all reduced geometry related bottlenecks, making techniques like these slightly less useful. There's still lots of GCN1/GCN2 cards around. For example R7 360, R9 390 and 390X were GCN2 based. Only 380, 380X and Fury used GCN3. Also in 400 series, everything below R7 460 is based on GCN2 and GCN1.

    GCNs geometry bottleneck is mostly visible when you have high triangle per pixel density. Consoles render usually at 900p or 1080p. This is 2x-3x less pixels than 1440p. GPU executes significantly less pixel shader instances on average per each triangle at 900p vs 1440p. This results in significantly worse GPU utilization when geometry is the bottleneck. Result = culling gives a significant advantage.

    GPU culling is still a very good technique, but the biggest impact can be seen on GCN1/GCN2 hardware and/or at lower rendering resolution. These benchmarks are 1440p on GCN5. Apparently the game doesn't have enough triangle density or depth complexity (occlusion) to see benefit of GPU culling in this scenario. We don't know exactly what their algorithm is doing. I am assuming their algorithm is similar to Frostbite's: https://www.slideshare.net/gwihlidal/optimizing-the-graphics-pipeline-with-compute-gdc-2016. Frostbite's algorithm is designed for GCN2, but their algorithm still shows significant gains on GCN3 (Fury X), especially when culling is done using async compute. The culling cost could be easily reduced by removing some culling steps that modern GPUs handle efficiently.
     
    Kej, tinokun, Heinrich4 and 8 others like this.
  9. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    492
    Likes Received:
    213
    Which is part of their next-gen geometry pipeline (NGG).
    That should be fairly obvious.
    Wasn't that confirmed by @Rys?
     
  10. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    These might be obvious things to you, but apparently some people seem to believe that Primitive Shaders are broken. I simply see a lack of software. I will believe shader autogeneration is feasible to implement when I see it.

    My personal opinion is that primitive shaders are exactly what we (GPU-driven rendering early adopters) asked for. Just give us an API to write the primitive shaders ourselves. This shader stage is definitely a better solution to a real problem than geometry/hull/domain shaders (which are all struggling to be used by anybody).
     
    #4570 sebbbi, Nov 7, 2017
    Last edited: Nov 7, 2017
    dogen, tinokun, Heinrich4 and 2 others like this.
  11. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    492
    Likes Received:
    213
    Well yeah, but @Rys was pretty concrete about it.
    Makes me wonder what caused the delays resulting in current state of Vega.
     
  12. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,703
    Likes Received:
    2,431
    The game is light on geometry complexity indeed, almost all characters and objects have fairly low polygon count. Areas are limited in scope as well.
     
  13. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Not fixed function, but still using some of the same hardware. It was described as writing assembly to keep efficient. So there is likely some setup work required or AMD would use compute shaders and call it a day. The 4SEs are likely still in play with some synchronization work required.

    I'm not sure that they are delayed as much as standardizing what could be the start of a next generation graphics pipeline taking time. The automatic part would be invisible. Some driver optimizations possibly making use of them already, but no way of really knowing outside a driver update significantly improving performance.
     
  14. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    492
    Likes Received:
    213
    Yes but it's been 4 month since FE launch, along with several new Vega-based products and driver updates.
     
  15. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Even if released devs would still need to use them. The automatic method may very well be what we've already seen. Just that AMD uses it for limited internal optimizations.
     
  16. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    sebbbi,
    Do you see the whole concept of primitive shaders more as a way to unchain programmers from a geometry pipeline straightjacket or as something that has the potential to significantly increase the overall rendering performance?
    Jawed already argued that it can fix significant bottlenecks, but from your latest comments, it seems to me that AMD had already fixed quite a bit of those in the fixed pipeline?

    Or is it a circular thing: the lack of freely programmable pipeline (and its performance impact for some cases) makes programmers currently avoid techniques because of their potential impact?

    While I understand how bad resource issue can impact performance today (as pointed out by Jawed), I don’t have a feel how much this really impacts performance today. Maybe the hopes for the primitive shader as a magic performance solution for Vega for today’s (!) games are just not justified.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,098
    Likes Received:
    2,815
    Location:
    Well within 3d
    Earlier disclosures from AMD had the opposite impression.

    For space reasons, I will just try to summarize the following sequence of posts:
    https://forum.beyond3d.com/posts/1997692/
    Link to tweet indicating primitive shaders are meant for the most part to be automatic. (Later tweet says more control might be considered, but is not promised).

    https://forum.beyond3d.com/posts/1997699/
    Confirmed that at the time primitive shaders were disabled. Developer API not ready, automatic generation inoperative.

    https://forum.beyond3d.com/posts/1997709/
    At least at that point, it was unclear how to reasonably expose primitive shaders to devs. Apparently, it is not straightforward to implement (like assembly) and difficult to realize gains over the automatic generation (driver's general level of generation is supposedly high).


    The automatic generation path seems like it could have a conceptual link to Sony's triangle seive optimization, which was a compilation flag that stripped all but position calculations from a vertex shader, after which frustrum/facedness/coverage could be used to cull. Integrating what was originally a separate invocation into one of the front end shaders seems like it wouldn't hold any mystery.
    How AMD can claim that the driver is able to spit out highly optimal primitive shader code but then keeps not using it is curious.
    That manually implementing this is proving difficult to expose may point to specific glass jaws or internal quirks that might leave dev-generated code prone to error or fragile in the face of changing conditions or devices.

    I haven't thought too hard on this, but one area that I was curious about was the topic of culling triangles that did not hit a sample point. For MSAA, that was one area optimized for in Polaris with the primitive discard accelerator. Not knowing how that element was implemented, wouldn't having the GPU's sampling behavior available statically allow for specialized hardware/instructions to improve on code size and per-clock work versus the general ISA?
     
    Kej, Lightman, DavidGraham and 5 others like this.
  18. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    315
    Likes Received:
    81
    The 1080p benchmarks bear this out, with an rx480 gaining near ten percent for minimum framerate: https://wccftech.com/wolfenstein-ii-deferred-rendering-and-gpu-culling-performance-impact/

    Anyway, AMD promised that the initial implementation of Primitive Shaders wouldn't need to be touched by developers to work. And apparently they're still disabled on current Vega hardware. With Raja leaving AMD for Intel it seems to support the notion that Vega was an overall failure in achieving it's stated goals, and those responsible may have been let go as a result. Speculation on "inside baseball" perhaps, but Vega certainly didn't do a whole lot over Polaris for most game performance verse clockspeed, and he's the second major engineer on Vega to seek work elsewhere.
     
  19. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Or frustrated with a lack of resources. That letter did indicate more funding going towards RTG. It wouldn't make much sense to saddle Intel with a flawed design and then leave to allegedly go work on it either.
     
    el etro and BRiT like this.
  20. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    37
    Likes Received:
    40
    would you trust devs such those behind batman games or project cars with a specific tool that will make the amd cards to run far better than they suppose to? i can see why amd is trying to make this automatic on their drivers cause im pretty sure the usual suspects will use it in such a way that they will be able to tank amd perf once more(unless amd is able to force it via async regardless of what the devs do )
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...