AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. oscarbg

    Newcomer

    Joined:
    Sep 2, 2009
    Messages:
    33
    Likes Received:
    10
    Also some technical questions:
    Vega supports double prec. global atomics? like Pascal added to OGL via GL_NV_shader_atomic_float64..
    also what about integer 64 global atomics.. was already there with Polaris?
    also with Siggraph GL news plus Vega details I expected to see a new EXT or ARB extension of a feature added by Vega and which were previously only avaiable on NV GPUs (and Intel iGPUs) and exposed via some NV extensions..
    I'm talking about conservervative rasterization (even Intel has a OpenGL extension for it and even on Linux Mesa is already implemented)
    NV has:
    GL_NV_conservative_raster
    GL_NV_conservative_raster_dilate
    GL_NV_conservative_raster_pre_snap_triangles (tier2 feature)
    Also hope to see that Vega OpenGL driver supports ARB_fragment_shader_interlock extension now that ROV are supported.. strangely AMD exposed already INTEL_fragment_ordering which should provide equal functionality on pre-Vega cards altough that cards doesn't support ROV feature.. how?
     
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    I don't think IPC has been reduced. FWIW, MAD-latency per clock still seems exactly in line with Fiji.
     
    Ryan Smith and kalelovil like this.
  3. SpaceBeer

    Newcomer

    Joined:
    Apr 15, 2017
    Messages:
    36
    Likes Received:
    14
    Location:
    The Balkans
    Isn't max OC clock difference between GP107 and GP106 around 10% (1900 MHz vs 2100 MHz). Even with 10% higher clocks, Vega wouldn't be competitive with GP102. So it's the architecture that makes large(r) difference
     
  4. leoneazzurro

    Regular

    Joined:
    Nov 3, 2005
    Messages:
    518
    Likes Received:
    25
    Location:
    Rome, Italy
    Also it could be quite difficult to compare chip even of the same architecture because i.e. GP107 may be using a lower number of metal layers compared to GP106 for cost saving, being an ultra-low budget part. It would be interesting to know these details
     
  5. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    But still there is a good factory setting difference of 15% for Boost and around 25% in practical Boostperformance.
     
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    If AMD wants to maintain the 4-cycle instruction cadence, they can't touch the instruction latency. If they break the 4-cycle cadence, they need a new shader compiler as the current one is allowed to assume that all standard instructions have no visible latency (results immediately usable for the next instruction). The 4-cycle cadence with no visible instruction latency was one of the key points of GCN architecture. It simplified shader compiler design radically.

    AMD could have however increased latency of instructions requiring s_waitcnt. These include LDS load, vector memory load, scalar memory load, texture sampling and some cross lane ops. It should also be simple to increase latencies of the L1 and L2 caches. But of course you then need higher occupancy to hide these extra latencies.
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Why do you even want float atomics?
    GCN1 supported float and double atomics (cmpswap, min, max) - nvidia can't do any of that (per this extension) but can do atomic add.
    But GCN3 tossed out all float and double atomics (well you can do swap...).
     
  8. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,910
    Likes Received:
    1,607
    I think nvidia now includes additional functionality with support of OpenGL 4.6.

    https://forum.beyond3d.com/posts/1993984/
    https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_shader_atomic_counter_ops.txt
     
  9. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    I don't see any FLOAT type atomic ops there.
     
    pharma likes this.
  10. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,151
    Likes Received:
    571
    Location:
    France
    Polaris was kind of a success. But for high end, yeah, it's a letdown... Being one year late, and with all the architectural changes, they still can't beat nVidia ? Having mores feature and being kind of "futur proof" is great, but you have to think about "right now raw performances" too. I just think they're understaffed and / or less talented, they just can't compete right now. It's just speculations anyway...
     
  11. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    Nvidia has supported float atomic add since G84.
     
    pharma likes this.
  12. Neumann

    Newcomer

    Joined:
    Feb 25, 2017
    Messages:
    13
    Likes Received:
    18
    DSBR is enabled for Vega FE in pro applications only in the current driver (17.20?).AMD posted a slide indicating performance uplift with it enabled vs disabled. Clearly it was a best case for them. The lack of similar slides for game workloads implies rather limited gains there. But I have read that power consumption improves a bit with it enabled anyway.

    [​IMG]
     
    Alexko and Lightman like this.
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    The claim was that they'd support anything else than atomic add with floats...
     
  14. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I understand what you're getting at, but I don't think maintaining the 4-cycle instruction cadence is necessary.

    Today's SIMD can be filled with one warp only (as long as there are CU external data fetches). But if they simply increase that number from 1 to 2, they could ping-pong between those two warps and no changes to the compiler would be strictly necessary. Since Nvidia needs that too, I don't think that's a huge problem?
     
  15. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    Interview with Chris Hook, with some more info on power consumption:



    Vega Nano - 150W TGP (total graphics power = GPU + Memory)
    Vega 56 - 165W TGP
    Vega 64 - 220W TGP

    He claims power consumption will vary with the complexity of the game being run. Maybe this means Vega drivers will have FRTC enabled by default, as part of Enhanced Sync.
    I have to say I really like FRTC. In my Crossfire setup and using a 74FPS FRTC (in a 40-75Hz Freesync monitor), I sometimes get to save over 300W (at the wall) with no discernible loss in performance.


    For Vega 64, the TGP<TDP difference is 75W, and for Vega 56 the difference is 45W (this much is being spent on power conversion, active fan and I/O?..)
    I guess the Nano should be a sub-200W TDP card, maybe with the same 175W as its predecessor.
     
    BRiT and BacBeyond like this.
  16. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Remember Polaris and GPU-z (not the tool's fault!) reports of power consumption, where in fact it was only the GPU, not the whole card. The difference was quite substantial. With Vega and the memory inclucded in this figure, differences will be smaller, but there nonetheless.
     
  17. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,057
    Likes Received:
    1,020
    GP106 boost: 1706MHz (1060 6GB)
    GP107 boost: 1392MHz (1050Ti)
    1706/1392= 1.226 or rounded to two significant digits 23% higher clocks for the GP106

    Numbers this time taken from nVidias own site.
     
  18. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    And this is TSMC's 16FF+ (GP106) vs. Samsung's 14LPP (GP107). Although GlobalFoundries' 14LPP (Polaris & Vega) is technically the same implementation as Samsung's, it's still a different foundry in a different place, so there could be some differences there, too.
    In the end, we still have no idea what's coming from TSMC out of that GF payout. Maybe Ryzen+Vega APUs?
     
  19. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    I was comparing 1050ti to RX560, where the 1050ti still enjoys a healthy advantage over the RX560.
     
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    They could as long as instruction issue latency does not drop below execution cycle count. They would have to touch something else.
    There's some measures like forwarding or rearranging operand fetch that could help cover a number of gaps, and at least with the probably delayed register writeback forwarding is already involved in covering for the trailing edge of the current pipeline.

    I've seen some commentary here, and also in places like GPUOpen about flaws in AMD's code generation choices. Is a new compiler necessarily a bad thing?

    I'm not convinced at this point that having instruction latency is an intractable, unsolved, or substantially difficult problem.

    GCN's tidy execution loop strikes me as being trapped in a local minimum. It affects too many architectural parameters to be easily changed, and simultaneously they cannot be adjusted without impacting it and each other.

    GFX9 increases vmcnt by 4x, with the extra two bits placed at the end of the representation to maintain backwards compatibility.
    Vega strives to maintain binary compatibility in other ways. Its old FP16 instruction encoding is maintained for the pre-GFX9 semantics, although those are now renamed as legacy operations. The new FP16 instructions that mostly mirror the old ones inherited their name, which the LLVM patch notes are rather caustic about describing.

    It's what AMD's VLIW GPUs did. The minimum occupancy for basic throughput per SIMD was two wavefronts, an A and B.
    There were some interesting games that could be played with passing register values or allocating within the register file, but they were potentially complex.
    GCN upped it to 4, but since the CU was the new basis it was able to drop it to 1 per SIMD.
     
    silent_guy likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...