AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    120
    Likes Received:
    181
    For me its not about "ignoring". Feature Level 12_1 exists since September 2014 with Maxwell v2 and there doesnt exist anything with it from Futuremark. The same goes for nVidia's specific VR features like "Multi-res Shading" which isnt supported in VRMark but there are a few games with it.

    It's just strange for me. Dont really care about it but it puts a shadow over their status between vendors.

    /edit: It reminds me a little bit about the Jon Peddie report which got sponsored by AMD to showcase that AMD is powering most "gaming devices".
     
  2. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    NV30 called, it wants its FP16 back. :-?
     
    Kej, egoless, xpea and 4 others like this.
  3. HKS

    HKS
    Newcomer

    Joined:
    Apr 26, 2007
    Messages:
    31
    Likes Received:
    14
    Location:
    Norway
    Tegra X2 (with a Pascal-based GPU) also have "fast" FP16.
     
  4. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    I'm wondering, where all this goes. Maybe we can put the pieces together. The most obvious ones:
    - 4 MiByte L2-Cache
    - 64× 64 kiB LDS
    - 64× 4 KiB Scalar RF
    - 64×4×64 KiB Vector RF
    - 64× 16 KiB L1-Cache

    If I'm not mistaken, that's 25856 KiByte only.
     
    Kej, T1beriu, Lightman and 4 others like this.
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I haven't seen a reference for this. The count of geometry engines is listed as four, and those are distributed at one per engine.
     
  6. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    One of the slides shows a grouping of eight CUs, but without mentioning explicitly eight shader engines.
     
  7. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,985
    Likes Received:
    4,570
    You mean like 3dmark Vantage that had nvidia PhysX GPU acceleration contributing to the final score? At least FP16 calculations aren't vendor specific.

    Rest assured though, they mentioned "demo", not benchmark.


    According to our own intel graphics resident, it's on every gen8 (broadwell) iGPU and newer.
    And if FP16 is heavily used on post-processing calculations, I could see Intel GPUs being able to take the post-processing effects all to themselves when a weaker dGPU is detected (say, laptops with Polaris 11/12 or GP107/108). After all, we're looking at a minimum of 800 GFLOPs FP16 even in the broadly available GT2 models. That could make quite the difference on <2 TFLOPs mobile GPUs like the GTX1050 and Macbook's Radeon Pro 460.

    That AMD powers most gaming devices should be common sense to everyone, since the 2013 consoles alone are responsible for well over half the active gamers worldwide (the ones who pay $50 for their games, not the F2P/<$1 mobile crowd, of course). AMD simply hired a consultant to tell them exactly how much they were above anyone else. They're numbers they can now throw at developers who might be hesitant on optimizing for their GPU architecture.
    That market research may have been partially responsible for the Bethesda deal, and for this new Ubisoft deal.





    Buffers for HBCC?
     
    no-X, BRiT and BacBeyond like this.
  8. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    There is nothing wrong with futuremark supporting fp16. they should offer it in such a way to be able to compare fp32 to 16. maybe with timespy. Its a neutral feature just as dx12. Of course how much anyone cares is dependent. they still dont support vulkan and their timespy was a bit meh. FP16 would actually seem more of a core feature to support if it became relevant.

    I wouldnt care as much if it were just far cry. not that interested in the series. but Wolfenstein has been looking really good. great character and the visuals are so sweet. I would start working heavily towards getting vega if the above comes to fruition. Vulkan, fp16, intrinsic shaders? Pascal does not support fp16 <3<3. that game should run beautifully on vega at max settings.

    On the competitive side pascal does not support this so if its a "simple enough" addition for developers to support fp16 and it does produce significant performance gains, that would be the biggest thing for vega. if they get anywhere near a real 25tflops GPU it should be well above a 1080ti

    What do the developers familiar with fp16 and 32 performance think though? I'd imagine if the gains in performance are there this could spread like wildfire.

    another question i have is if consoles are using this. it was said ps4 supports doing fp16 operations at once. adoption in the console space that can easily transfer to PCs would help adoption
     
    #3268 Genotypical, Jul 31, 2017
    Last edited: Jul 31, 2017
    BacBeyond likes this.
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    GCN can subdivide the CUs within a shader engine. There is a shader array ID bit tracked by a wavefront in the Southern Islands ISA, and even though AMD stopped documenting the context register it is in in later revisions the driver patches have maintained it. It has almost always been documented as one per engine, but perhaps Vega's shader engines have two.
     
    Cat Merc likes this.
  10. Clukos

    Clukos Bloodborne 2 when?
    Veteran Newcomer

    Joined:
    Jun 25, 2014
    Messages:
    4,462
    Likes Received:
    3,793
    FP16 might actually be more than just a gimmick used in benchmarks. Ps4 Pro supports double rate FP16 after all. If developers optimize for that, why not run the same optimizations on RX Vega?
     
    Silent_Buddha and Heinrich4 like this.
  11. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,781
    Likes Received:
    2,568
    Yep, don't think miners will even target Vega 64 Liquid, It's Vega 56 they should be worried about.
    In the context of Vega launch? Cherrypicking? Like testing games where they know NV has trouble with DX12 (like BF1 SP). Testing with less than max settings (Sniper 4, Far Cry Primal, Doom, Deus Ex, Ashes). Also, COD Infinite Warfare has a bug in the current NV driver where erratic fps drops are frequent (it literally drops to 30fps no matter the card or settings). Also FuryX having better min fps than all NV cards was a dead giveaway.

    There is also clever product placement, for example in this slide the GTX 1080 has better minimums than Vega in several titles, but is placed to the left to create the illusion of being consistently behind. (also observe COD infinite Warfare where 980Ti and 1080 have the same minimums due the above mentioned bug).
    [​IMG]

    [​IMG]
     
    #3271 DavidGraham, Jul 31, 2017
    Last edited: Jul 31, 2017
    pharma likes this.
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The full slide deck appears to have a fuzzy wafer shot for Vega.
    A fair number of my earlier guesses from the artistic rendering of the die were off. It's actually more standard in terms of layout than I had thought. The CU blocks, caches, and RBEs are where they usually are. Visually, the caches and register files light up where one would expect, and the center strip shows a decent amount of storage as well. The L2 seems to be subdivided into 8 blocks on a side.

    That does leave the HBCC area in AMD's diagram, which doesn't appear to be hit by the light to really show off any arrays. However, the diagram and the shadowed regions of the wafer shot show a region of pretty high regularity that could contain a goodly amount of storage. If it's buffering multiple memory pools and tracking irregular page residency and use history, it could add up to a fair amount.

    Then there's the memory controllers and the coupled infinity fabric, which should have some pretty deep buffers. The extent of the fabric's presence is still not clear, although as noted a lot of the GPU proper looks rather standard.
     
    xpea and Cat Merc like this.
  13. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    You are missing ROP caches. These caches are likely much larger in Vega because of the tiled rasterizer. Also parameter cache (for vertex shader -> pixel shader interpolants). They also need to store the vertex interpolants for much longer time, because of the tiled rasterizer. That might require quite a bit of extra storage.

    So my guess is that "draw stream binning rasterizer" is the biggest reason for the added SRAM.
     
    tinokun, Kej, Alexko and 8 others like this.
  14. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Reason I'm asking for the reasons of the min frame rates is because I doubt that it has anything to do with raw calculation power.

    If Vega is a GP102 class device (as seen with SpecViewPerf) with some unexplained calculation bottleneck, it's min frame rates should be more or less in line with a Titan Xp.
     
  15. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    Maybe not everyone wants to pay 40% more for a 1080 Ti? $700 vs $500 is quite a lot.
     
    RootKit and BRiT like this.
  16. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    Freesync supports frame doubling just like GSync. Under 48 fps on that monitor would result in 94 and lower hz as it doubled.

    They are also ignoring the extra cost of a GSync vs Freesync monitor in that example
     
  17. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
  18. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    SIMD PROCESSING UNIT WITH LOCAL DATA SHARE AND ACCESS TO A GLOBAL DATA SHARE OF A GPU

    If I had to guess, per SIMD LDS? The filing and publishing dates on that are only months old.

    There was a linux driver patch the other day adjusting waves per CU from 40 to 16 in no uncertain terms. So this patent is likely Vega considering it's published already. Seems to be doing some sort of recursive 16:1 reduction involving pixels. At first I thought it was a tensor thing, but "pixels".

    That's a distinct possibility, but I can't imagine it's more than a MB. Even 128K should be able to handle 16GB of ram plus a few tags and whatever errata is required by the controller. CPUs have had page tables for a while and nothing approaching that scale.
     
    Cat Merc likes this.
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The wafer shot does have some portions where the RBE sections are well-lit. They seem mostly free of large arrays.
    I would think that the rasterizer would mostly interact with depth information, which is something like 4KB per RBE in Hawaii. AMD's been pretty consistent with RBE caches with GCN, and GCN's counts seem to be consistent with GPUs prior.
    Even if significantly larger, it's starting from a very low point and AMD may be counting on the L2 to mitigate any capacity needs from now on.

    I would have thought the actual binning and rasterizer component and most of their storage would be nearer the geometry front ends, given how intimately it links back to primitive setup. It might be a reason for the front end to be constrained if it does heavily rely on the RBE caches rather than hosting its own storage and a snapshot of the bin information. The RBEs are stretched out further in physical terms, and could be servicing requests from the rasterizer while still under load from the CUs.

    That could explain why AMD's so gung-ho with primitive shaders and programmable methods that try to keep things away from that path, and may align with why Vega's culled and non-culled performance currently seem to be drawing from a similarly limited pool in some of the synthetics--although the new method was supposedly off for those.
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    So, I'm not surprised to see that gaming performance with Vega FE is hobbled by both immature power management and inactive DSBR.

    Is there a decent slide deck or white paper anywhere? Is there a full architecture description? Weren't more architectural features due to be revealed at Siggraph? I haven't seen any hint of anything new so far from Siggraph, judging by this thread.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...