AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    178
    Likes Received:
    147
    Sure, using only frequency was a simplification. The whole picture is much more complicated. The question is, when you tune Vega10 to ~Fiji's power level, how much is the resulting perf. delta between these two?
     
  2. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    OTOH why would anyone reasonably expect that a secondary optimization would have more impact that the ones in the slides?

    Perf and perf/W improvements of 10% for some cases are already quite impressive. GPUs have long passed the point where a single feature has an outsized impact.
     
    pharma and Lightman like this.
  3. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Because if many of the mechanisms it relied upon weren't, and last I checked still haven't, been implemented, then it would be sub-optimal at best. The same could be said for async compute originally implemented with Hawaii. The hardware was there, but implementation largely non-existent until years later. Those original DSBR numbers would more than likely just be the intelligent workgroup distributor being enabled. The fetch and shade once may not even be a part of that. What bandwidth/power gains were achieved could easily be the result of higher cache utilization from the tiling. Out of order rasterization across tiles essentially.

    That doesn't mean a dev can't code a path slightly differently and break a feature. Or in the case of the FP16 mishap, completely disable the path until the compiler gets updated. TBDR has the potential to have a large impact if fully implemented. Especially for the mobile platforms where it is more common. Mobile Vega seems rather popular and the Intel deal also interesting, so Intel may know more than we do. That move is either Intel defending against Ryzen APUs or offensive against ARM mobile devices. The former seems odd with AMD as Intel doesn't stand to sell many more CPUs that way or benefit from GPU sales. Current products don't appear in direct competition either. AMD at low end with Raven, Intel low-mid with KabyG, and the mid tiers a bit of an unknown still, but I wouldn't be surprised if AMD had one for SIMD workloads with 4/8 memory channels feeding it. The offensive against ARM makes more sense, unless Intel just wants to stick it to Nvidia. So far Intel seems to want ultra-thin mobiles with higher graphics performance which neither AMD or Nvidia really compete. Back to the original point, that would require a bit more efficiency to keep processing power down. A more efficient DSBR would do that, along with the packed math as FP16 is more common with mobile.

    Then there is the VGPR Indexing that last I checked was still disabled and would likely be associated with that virtual vector register file patent that was recently linked. The one with Koduri and Mantor listed as inventors is likely significant to one of the architectures. Having been already published would suggest Vega or perhaps a software/compiler implementation? We haven't seen any mGPU implementations either which would effectively multiply apparent cache for the purpose of binning.
     
  4. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    178
    Likes Received:
    147
    Yes, Intel may know more than we do. There were multiple hints the GFX IP inside Intel's chip is not a GFX9.x but the good old GFX8 + various tweaks. Was there any confirmation of a "Vega feature" besides HBCC? I mean NGG, DBSR, NCU, etc.?
     
  5. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Large as in 10%? I consider that large. And that’s what AMD sees in some cases.

    I think your performance improvement expectations of some individual features are wildly optimistic.
     
    DavidGraham and pharma like this.
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    That would be unfortunate since AMD markets the work distributor as part of the new geometry path rather than the pixel engine that the DSBR is under.

    Is this back to equating the Draw Stream Binning Rasterizer to actual Tile-Based Deferred Rendering implementations?

    Given the alleged roadmap of Intel's going to an EMIB-based Gen 12 and 13, it's also possible that AMD's custom chip is meant to temporarily fill a gap in Intel's product line due to the 10nm blowup sinking Intel's internally-sourced graphics efforts for 1-2 major product cycles.
    Outside competition would factor into this, but also Intel's need to get something out even versus itself.

    That would seem to run counter to the "virtual" component of the register file patent, and the timing appears off for it being applicable. VGPR indexing is software-visible, as it is used by the shader code--whose view of the register file is being spoofed by the virtual register file scheme.

    The initial filing for what appears to have become the DSBR was 2013. There's a lot of games that could be played with when disclosures are filed, but there's a multi-year gap that does seem consistent with these two techniques being part of different designs.
     
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Try this interpretation:
    It's overbuilt because with only so few clients, IF is not yet pushed to it's limits.
     
  8. Locuza

    Newcomer

    Joined:
    Mar 28, 2015
    Messages:
    45
    Likes Received:
    101
    Fritzchens Fritz did a follow up with clear PS4 Pro shots:


    There is the open question if the PS4 Pro has 64 ROPs because it has four Shader Engines and deactivates two for backwards compatibility.

    ----

    But speaking of pure Vega, according to Marc Sauter (y33H@) from the german IT site Golem, AMD said in a breakout session shortly before the CES2018 that the implicit driver path for primitive shaders was cancelled and it will be only up to developers with explicit control to make use of it.
    Now the waiting game shifts to when AMD will provide direct control for developers and after that, when a game will actually utilize it.
    https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11610696#post11610696
    https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11611522#post11611522
     
    Lightman, yuri, pharma and 5 others like this.
  9. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Wut?
    So no primitive shaders support then.

    Why?
     
  10. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,151
    Likes Received:
    571
    Location:
    France
    Too complicated IMO... So, with Primitive Shaders out, what's left to enable ? Is FP16 working in compute and pixel shaders ? DSBR ? NGG fast path ? Or nothing and Vega is performing like an OC Fiji and that's it ?
     
    yuri and A1xLLcqAgt0qc2RyMz0y like this.
  11. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    513
    Likes Received:
    234
    Looks like that.
     
    yuri likes this.
  12. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Really depends on the game with more potential in lesser optimized or immediate mode titles. The improvements are based on how much overdraw currently exists in various titles. Not to mention the bandwidth savings Nvidia likes to tout with their compression or more appropriately, better culling implementation. Maxwell got a bit more than 10% there.

    Until FP16/RPM is more widely functional and primitive shaders become documented, I wouldn't consider the features fully functional. Just working at a limited capacity and not necessarily synergizing. RPM should improve primitive shaders, PS improve culling, culling improving binning with less clutter, binning limiting overdraw and fragment dispatch. It just seems there is a lot of software work to be done still.

    Not equating, but moving much of the culling/overdraw savings from fragment shaders into the primitive pipeline as earlydepthstencil in less than optimal rendering orders. Deferred Draw Stream Binning Rasterizer with Tile Based Rendering might make more sense. With async compute, having an execution gap between the primitive pipeline and fragment shaders is less of an issue. Only catch is needing async compute, which is still somewhat limited. It's possible the effects only work well with DX12/Vulkan, but my thinking is a new toolchain is required and that's the huge rewrite that is still occurring. At least the public commits aren't what I would call stable with major functionality being added. A couple of months ago even FP16 wasn't working across the entire product stack.

    That's possible, but wouldn't necessarily explain AMD not attaching a similar chip to Ryzen. AMD appears to have made APUs smaller and possibly larger, but not in direct competition. An 8 core Ryzen with 32 CU Vega and HBM2 and big 120mm cooler would dominate right now. In part because discrete parts became scarce.

    Not counter as much as attacking the problem from different angles. VGPR spilling technically allows the larger register file size, just with unacceptable performance in most cases. The virtual RF would address that with a renaming and paging mechanism that should be transparent to the shader or DSBR model. It would be transparent to the original design as it would be on par with simply providing a larger cache or register file and relaxing the bin size requirements. Only begin raster on a bin when hitting a context limit, running out of geometry, or hinting from a prior frame all geometry is present. Actual bin size would be more complex to model as register pressure could vary significantly based on the shader.

    That would imply IF is a fixed configuration and there are nodes on the network that don't attach to anything or are only active in server/pro scenarios. That could be the case if physically dividing the CUs into separate virtual hardware devices, but that runs counter to what AMD has been advertising. Where the ACEs allow load-balancing many clients in a secure fashion. That's why I think the network was enlarged to accommodate additional IO for Vega10 in server/pro parts. Extra space in the form of larger/additional PHYs with internal routing for growing the network like Epyc. 32 PCIe lanes on a gaming part would be largely wasted, but practical on SSG, duo, or APU if using the same part.
     
    DeeJayBump likes this.
  13. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    All the literature says "Vega", so GFX9.x seems more likely. The overview mentions "Vega Pixel Engine" along with HBCC, so it should be full Vega. Pixel Engine in the whitepaper covers DSBR at least, so more than likely it's all Vega.
     
  14. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,973
    Likes Received:
    3,050
    Location:
    Pennsylvania
    Linux only I believe right now?
     
  15. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    No, it would only imply that IF has a certain amount of overhead that starts to amortize only beyond the requirement and current count of the clients.
     
    Lightman likes this.
  16. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,749
    Likes Received:
    2,516
    AMD said at multiple occasions that DSBR is only useful for SKUs with limited resources, and now their own testing confirms it. At Ultra settings, most games gain about 5%, or less, and likely in very specific scenarios too. And now with the cancellation of driver primitive shaders, I think it's time you abandon your theory of 30% more performance than a TitanXP through unicorn drivers. It was never a good theory to begin with. Writing was all over the wall that Vega was missing several things when the features were never enabled at launch or a few months after.
     
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,166
    Likes Received:
    1,836
    Location:
    Finland
    How do you read it like that? The support is (will be) there if the dev decides to build the game using them, it's just not the first advertised automatic conversion from vertex+geometry or whatever
     
  18. Despoiler

    Newcomer

    Joined:
    Oct 1, 2015
    Messages:
    17
    Likes Received:
    15
    This is the first time I've ever used this. (╯°□°)╯︵ ┻━┻ This seems like gross mismanagement of RTG by Raja. PS stated as dev tool (Raja), to PS as driver magic(Raja), now back to the original plan of PS being dev tool(Su).
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  19. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    513
    Likes Received:
    234
    Raja is whatever, why did @Rys said it was driver magic?
     
    yuri and A1xLLcqAgt0qc2RyMz0y like this.
  20. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,832
    Likes Received:
    4,452
    Because it was supposed to be...

    I guess it doesn't make much sense to spend resources on that now because few gamers have Vega cards and no gamer is buying AMD cards to play games, much less Vega chips.

    It does bother me that driver development for games is seemingly decelerating, but high-end PC gaming as a whole is actually dying, and it might die really fast.
    I wonder what will happen to PC game sales after a year of severe drought of performance graphics cards in the shelves.


    The IHVs need to come up with a solution fast. AMD needs to move up the launch of high-performance gaming APUs with HBM as much as they can.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...