AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by Deleted member 13524, Sep 20, 2016.

  1. oscarbg

    oscarbg Newcomer

    Also some technical questions:
    Vega supports double prec. global atomics? like Pascal added to OGL via GL_NV_shader_atomic_float64..
    also what about integer 64 global atomics.. was already there with Polaris?
    also with Siggraph GL news plus Vega details I expected to see a new EXT or ARB extension of a feature added by Vega and which were previously only avaiable on NV GPUs (and Intel iGPUs) and exposed via some NV extensions..
    I'm talking about conservervative rasterization (even Intel has a OpenGL extension for it and even on Linux Mesa is already implemented)
    NV has:
    GL_NV_conservative_raster
    GL_NV_conservative_raster_dilate
    GL_NV_conservative_raster_pre_snap_triangles (tier2 feature)
    Also hope to see that Vega OpenGL driver supports ARB_fragment_shader_interlock extension now that ROV are supported.. strangely AMD exposed already INTEL_fragment_ordering which should provide equal functionality on pre-Vega cards altough that cards doesn't support ROV feature.. how?
     
  2. CarstenS

    CarstenS Legend Subscriber

    I don't think IPC has been reduced. FWIW, MAD-latency per clock still seems exactly in line with Fiji.
     
    Ryan Smith and kalelovil like this.
  3. SpaceBeer

    SpaceBeer Newcomer

    Isn't max OC clock difference between GP107 and GP106 around 10% (1900 MHz vs 2100 MHz). Even with 10% higher clocks, Vega wouldn't be competitive with GP102. So it's the architecture that makes large(r) difference
     
  4. leoneazzurro

    leoneazzurro Regular

    Also it could be quite difficult to compare chip even of the same architecture because i.e. GP107 may be using a lower number of metal layers compared to GP106 for cost saving, being an ultra-low budget part. It would be interesting to know these details
     
  5. seahawk

    seahawk Regular

    But still there is a good factory setting difference of 15% for Boost and around 25% in practical Boostperformance.
     
  6. sebbbi

    sebbbi Veteran

    If AMD wants to maintain the 4-cycle instruction cadence, they can't touch the instruction latency. If they break the 4-cycle cadence, they need a new shader compiler as the current one is allowed to assume that all standard instructions have no visible latency (results immediately usable for the next instruction). The 4-cycle cadence with no visible instruction latency was one of the key points of GCN architecture. It simplified shader compiler design radically.

    AMD could have however increased latency of instructions requiring s_waitcnt. These include LDS load, vector memory load, scalar memory load, texture sampling and some cross lane ops. It should also be simple to increase latencies of the L1 and L2 caches. But of course you then need higher occupancy to hide these extra latencies.
     
  7. mczak

    mczak Veteran

    Why do you even want float atomics?
    GCN1 supported float and double atomics (cmpswap, min, max) - nvidia can't do any of that (per this extension) but can do atomic add.
    But GCN3 tossed out all float and double atomics (well you can do swap...).
     
  8. pharma

    pharma Veteran

    I think nvidia now includes additional functionality with support of OpenGL 4.6.

    https://forum.beyond3d.com/posts/1993984/
    https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_shader_atomic_counter_ops.txt
     
  9. mczak

    mczak Veteran

    I don't see any FLOAT type atomic ops there.
     
    pharma likes this.
  10. Rootax

    Rootax Veteran

    Polaris was kind of a success. But for high end, yeah, it's a letdown... Being one year late, and with all the architectural changes, they still can't beat nVidia ? Having mores feature and being kind of "futur proof" is great, but you have to think about "right now raw performances" too. I just think they're understaffed and / or less talented, they just can't compete right now. It's just speculations anyway...
     
  11. RecessionCone

    RecessionCone Regular Subscriber

    Nvidia has supported float atomic add since G84.
     
    pharma likes this.
  12. Neumann

    Neumann Newcomer

    DSBR is enabled for Vega FE in pro applications only in the current driver (17.20?).AMD posted a slide indicating performance uplift with it enabled vs disabled. Clearly it was a best case for them. The lack of similar slides for game workloads implies rather limited gains there. But I have read that power consumption improves a bit with it enabled anyway.

    [​IMG]
     
    Alexko and Lightman like this.
  13. mczak

    mczak Veteran

    The claim was that they'd support anything else than atomic add with floats...
     
  14. silent_guy

    silent_guy Veteran Subscriber

    I understand what you're getting at, but I don't think maintaining the 4-cycle instruction cadence is necessary.

    Today's SIMD can be filled with one warp only (as long as there are CU external data fetches). But if they simply increase that number from 1 to 2, they could ping-pong between those two warps and no changes to the compiler would be strictly necessary. Since Nvidia needs that too, I don't think that's a huge problem?
     
  15. Interview with Chris Hook, with some more info on power consumption:



    Vega Nano - 150W TGP (total graphics power = GPU + Memory)
    Vega 56 - 165W TGP
    Vega 64 - 220W TGP

    He claims power consumption will vary with the complexity of the game being run. Maybe this means Vega drivers will have FRTC enabled by default, as part of Enhanced Sync.
    I have to say I really like FRTC. In my Crossfire setup and using a 74FPS FRTC (in a 40-75Hz Freesync monitor), I sometimes get to save over 300W (at the wall) with no discernible loss in performance.


    For Vega 64, the TGP<TDP difference is 75W, and for Vega 56 the difference is 45W (this much is being spent on power conversion, active fan and I/O?..)
    I guess the Nano should be a sub-200W TDP card, maybe with the same 175W as its predecessor.
     
    BRiT and BacBeyond like this.
  16. CarstenS

    CarstenS Legend Subscriber

    Remember Polaris and GPU-z (not the tool's fault!) reports of power consumption, where in fact it was only the GPU, not the whole card. The difference was quite substantial. With Vega and the memory inclucded in this figure, differences will be smaller, but there nonetheless.
     
  17. Entropy

    Entropy Veteran

    GP106 boost: 1706MHz (1060 6GB)
    GP107 boost: 1392MHz (1050Ti)
    1706/1392= 1.226 or rounded to two significant digits 23% higher clocks for the GP106

    Numbers this time taken from nVidias own site.
     
  18. And this is TSMC's 16FF+ (GP106) vs. Samsung's 14LPP (GP107). Although GlobalFoundries' 14LPP (Polaris & Vega) is technically the same implementation as Samsung's, it's still a different foundry in a different place, so there could be some differences there, too.
    In the end, we still have no idea what's coming from TSMC out of that GF payout. Maybe Ryzen+Vega APUs?
     
  19. seahawk

    seahawk Regular

    I was comparing 1050ti to RX560, where the 1050ti still enjoys a healthy advantage over the RX560.
     
  20. 3dilettante

    3dilettante Legend Alpha

    They could as long as instruction issue latency does not drop below execution cycle count. They would have to touch something else.
    There's some measures like forwarding or rearranging operand fetch that could help cover a number of gaps, and at least with the probably delayed register writeback forwarding is already involved in covering for the trailing edge of the current pipeline.

    I've seen some commentary here, and also in places like GPUOpen about flaws in AMD's code generation choices. Is a new compiler necessarily a bad thing?

    I'm not convinced at this point that having instruction latency is an intractable, unsolved, or substantially difficult problem.

    GCN's tidy execution loop strikes me as being trapped in a local minimum. It affects too many architectural parameters to be easily changed, and simultaneously they cannot be adjusted without impacting it and each other.

    GFX9 increases vmcnt by 4x, with the extra two bits placed at the end of the representation to maintain backwards compatibility.
    Vega strives to maintain binary compatibility in other ways. Its old FP16 instruction encoding is maintained for the pre-GFX9 semantics, although those are now renamed as legacy operations. The new FP16 instructions that mostly mirror the old ones inherited their name, which the LLVM patch notes are rather caustic about describing.

    It's what AMD's VLIW GPUs did. The minimum occupancy for basic throughput per SIMD was two wavefronts, an A and B.
    There were some interesting games that could be played with passing register values or allocating within the register file, but they were potentially complex.
    GCN upped it to 4, but since the CU was the new basis it was able to drop it to 1 per SIMD.
     
    silent_guy likes this.
Loading...

Share This Page

Loading...