AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by Deleted member 13524, Sep 20, 2016.

  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    When I first saw that diagram, I did not think of asymmetrical SIMDs but of lane gating, which can help quite a bit if you expect to run into power limits or have a rather aggressive clock boost in place. Since the number of ALUs itself hasn't been AMDs problem even on 28nm, they surely would be able to invest some of the saved area in 14 nm into more sophisticated power management. Initially, I'd have thought only 1 or maybe 2 out of the four SIMDs in a CU to sport a feature like this for area reason, since I honestly have no idea how costly it would be. But the truck diagram, even if maybe not legit, seems to imply otherwise (the trailers are still all there, none completely saved).
     
  2. revan

    Newcomer

    Joined:
    Nov 9, 2007
    Messages:
    55
    Likes Received:
    18
    Location:
    look in the sunrise ..will find me
  3. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    No mention of lane gating or variable sized wavefronts (so probably not coming or at least not heavily advertised).
    New geometry pipeline with tile bins and HSR (one could call it "partially deferred"?) increasing the throughput to up to 11 triangles/clock with 4 geometry engines and helping the bandwidth and energy efficiency.
    No more ROP caches (handled by L2 now).
    NCU can handle 128 FP32 instructions per clock instead of 64. Just larger? And NCU is optimized for higher clock speeds.
    HBC is not HBM.
     
  4. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    568
    Likes Received:
    104
  5. CUs now have 128 ALUs, or maybe they are clocked at twice the frequency. I can't really tell from this slide.
    Geometry performance seems to be substantially enhanced over Polaris, which was already seemingly on pair with Pascal.
    ROPs using L2 cache probably means there's a lot more L2 cache in it. It could also mean Vega is even less dependent on VRAM bandwidth.
     
  6. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    • "Storage Network" on the cache controller along with NVRAM and System DRAM
    • "NCU is optimized for higher clock speeds and higher IPC" (Looks like 2x packed math for higher IPC, no idea on optimizations)
    • 512 8bit ops/clock, Double Precision is "Configurable" (No idea how, but seemingly different from the 8bit, 16bit, and 32bit pow2s or it would have been shown)
    Not much on CU configuration beyond apparently twice as many lanes as before. The CU to NCU graphic is kind of strange. Regular/packed math at seemingly twice the frequency. No explanation how the clocks increased.

    No mention of the scalar, but I'm guessing there has to be more than one now? Without ridiculously higher clocks, beyond the SIMD increase, I doubt it could keep pace with 8 SIMDs generating just flow control. Design would seemingly have half as many CUs or the quoted FLOPs would be substantially more with double ALUs and higher clocks. Would also correspond to half as many nodes on that mesh.

    For the deferred stuff with that binning it should be substantially less dependent. Lots more cache seems likely for the bins and driving twice as many SIMD lanes as before, so that should help too.
     
  7. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Both. Twice the throughput per clock (128 fp32 ops) and higher clocked.
    Polaris isn't on par with Pascal. The tile binning rasterizer could propel its performance past a Pascal with the same amount of rasterizers, depending on the details like how much on chip storage there is for the tile bin cache and how well the work distribution ties in with the framebuffer tile cache in L2 (as the specialized ROP caches may be gone, one needs way more bandwidth there, or the L2 cache just backs the still existing ROP caches [the slide has also an L1 for the pixel pipeline backed by the L2]).
     
  8. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Gipsel, why are you sure, HBC is not HBM? I did not see that from the leaked slides.
     
  9. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
  11. Locuza

    Newcomer

    Joined:
    Mar 28, 2015
    Messages:
    45
    Likes Received:
    101
    Just look again :)
    http://cdn.videocardz.com/1/2017/01/AMD-VEGA-VIDEOCARDZ-34.jpg

    There is a new High-Bandwidth-Cache (HBC) connected to the High-Bandwidth-Cache-Controller (HBCC).
    I wonder how the caching works.
     
  12. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    And where does it say anything about HBC NOT being HBM(gen2)? Please do not only go by fancy names printed on a marketing slide.

    Remember Xeon Phi? What one of the applications for it's HMC is? DRAM-Caching.
     
  13. Locuza

    Newcomer

    Joined:
    Mar 28, 2015
    Messages:
    45
    Likes Received:
    101
    It's clearly a new cache on the chip and it would be news to me if you would need one for HBM(2) integration.
     
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    No, it's not. It's the HBM2 which is on package, on interposer but not on chip

    edit: just for the sake of it, do you think it's a coincidence that HBM2-slides are right after the HBC-slide and before HBCC?
     
  15. Locuza

    Newcomer

    Joined:
    Mar 28, 2015
    Messages:
    45
    Likes Received:
    101
    I stand corrected, it really might be just HBM2 with a certain cache configuration.
     
  16. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    The only way to get twice the FP32 throughput per clock is if the SIMDs got chained together and executed consecutive instructions in a single cycle. Simply doubling the number I wouldn't equate to the meaning of twice the throughput. It should mean each ALU doubled throughput if that were the case. Or quoting native FP64 performance which doesn't appear to be the case.

    Worth mentioning it followed the slide about "Introducing the world's most scalable GPU memory architecture"
     
  17. New slides have been coming up:

    http://cdn.videocardz.com/1/2017/01/AMD-VEGA-VIDEOCARDZ-37.jpg

    11 polygons with 4 geometry engines?
    Wut?
     
  18. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Second thought on that 128 ALUs, what if MADD was counting as two ops or they could dual issue certain FP32 instructions? In that case we're looking at the traditional SIMD count, but with the packed math adding some functionality.

    EDIT: This could also be the FMA4 instructions and new GPU specifics. Extra operands would be useful with the packed math.
     
    #539 Anarchist4000, Jan 5, 2017
    Last edited: Jan 5, 2017
  19. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    #540 seahawk, Jan 5, 2017
    Last edited: Jan 5, 2017
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...