AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. sir doris

    Regular

    Joined:
    May 9, 2002
    Messages:
    643
    Likes Received:
    102
    Surely 13b vs 21b transistors wouldn't be comparable regardless of the process used?
     
  2. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,370
    Likes Received:
    787
    Same market, yes, but a similar price would be very surprising.
     
  3. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,285
    Likes Received:
    1,434
    Maybe for evaluating price and cost (which are really far less important aspects to discuss in a tech forum), but for performance, architectural efficiency, power efficiency and scalability on different nodes, not so much.

    You have 13b transitions consuming 300W on 7nm, vs 21b transistors doing the same thing on 12nm (basically 16nm). How is that irrelevant?
     
    #5683 DavidGraham, Nov 8, 2018
    Last edited: Nov 8, 2018
    A1xLLcqAgt0qc2RyMz0y and pharma like this.
  4. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,065
    Likes Received:
    8,914
    Location:
    Under my bridge
    Watts doesn't matter. It's flops/watt (and flops/mm², etc), or rather, usable work per watt that matters. If a 300W part on 7nm with 21 gigatrannies can render 50 megayums of graphical lovelies, and a 300w part on 16 nm with 13 gigatrannies can render 40 megayums of graphical lovelies, the 7nm part is more effective.

    Assuming you are choosing a part based on power efficiency. You may choose output per $.

    Regardless, watts consumed means very little without useful benchmarks to compare workload.
     
    no-X, Lightman, Anarchist4000 and 2 others like this.
  5. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,394
    Likes Received:
    79
    Right, and thats why at 7nm a Nvidia chip at 150 watts would trounce an AMD part at 150 watts, you would get 2/3 more fps. And if Nvidia was different its 150 watts chips would be in both Sony and MS consoles.
     
    pharma and DavidGraham like this.
  6. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,285
    Likes Received:
    1,434
    The more transistors you can put in a chip in a given TDP, the more performance and features you can have out of the chip. Volta is large because it's on an old node, and because it has extra Tensor Cores which enables it to have vastly more AI performance than MI60. It's also vastly more powerful in rasterization. It has more ROPs, TMUs and polygon throughput than Vega 20. Even a TitanV is (which is a cut down chip) is at least 50% faster in rasterization.

    Seeing this situation, NVIDIA can double the RTX hardware on 7nm, considerably increase their rasterization hardware, maintain chip size, and have extra features all while staying on the same power envelope.
    Agreed, that's why I included various performance aspects into that discussion.
     
    #5686 DavidGraham, Nov 8, 2018
    Last edited: Nov 9, 2018
    A1xLLcqAgt0qc2RyMz0y likes this.
  7. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,125
    Likes Received:
    3,762
    Maybe.

    But first it would have to exist.
     
  8. keldor

    Newcomer

    Joined:
    Dec 22, 2011
    Messages:
    51
    Likes Received:
    62
    I was using the Volta numbers. IIRC, Turing cut down on the Tensor core count by 1/2. It's a tradeoff - fewer Tensor cores means more of something else.
     
  9. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,426
    Likes Received:
    357
    Except that metric is largely useless because of power curves. Really need to equalize for mm2 to compare architectures, and even that can be tricky. SRAM for example is transistor dense so transistors may not be valid for comparison. Fabs and process can vary comparisons a bit.

    Practically any sized chip can be made to consume 150W and the chip with more area and lower clocks/voltages will almost always be more efficient on a similar node for parallel processing. Doubling processors will double performance at roughly twice the power. Double clocks and power explodes as the curve is exponential.

    With the die size in that comparison, you're probably looking at 3-4 Vega 20s versus a single Volta. Equalize power and the Vegas may be more efficient and offer far more bandwidth.
     
    Lightman likes this.
  10. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    320
    Likes Received:
    266
    It's still the same number of cores. Respectively what they could fit within the power budget, assuming a primarily one-sided workload stressing only one core type to the limit simultaneously.
    Then there has also been the T4 announcement, which runs at full core count, but only half the clock, for half the Tensor Core throughput of the consumer cards.

    So to sum that topic up, for the fastest chip in each generation (Vega64, MI60, Tesla V100, Quadro RTX 6000). All according to published data sheets, this time.

    Matrix-multiplication, with FP16 input, FP16 accumulator:
    Vega10: ~28 Tflops
    Vega20: ~30 Tflops
    Pascal: ~22 Tflops
    Volta: ~30 Tflops on FP32 core OR ~120 Tflops on Tensor Core
    Turing: ~33 Tflops on FP32 core OR ~130 Tflops on Tensor Core​

    Matrix-multiplication, with FP16 input, FP32 accumulator:
    Vega10: ~14 Tflops
    Vega20: ~30 Tflops (to be confirmed, but likely)
    Pascal: ~11 Tflops
    Volta: ~15 Tflops on FP32 core OR ~120 Tflops on Tensor Core
    Turing: ~16 Tflops on FP32 core OR ~130 Tflops on Tensor Core (only 57 Tflops on GeForce)​

    Matrix-multiplication, with FP32 input, FP32 accumulator:
    Vega10: ~14 Tflops
    Vega20: ~15 Tflops
    Pascal: ~11 Tflops
    Volta: ~14 Tflops
    Turing: ~16 Tflops​
     
    #5690 Ext3h, Nov 9, 2018
    Last edited: Nov 9, 2018
  11. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,285
    Likes Received:
    1,434
    Before you do any die size comparison, you should equalize the node first.
    Volta offers 900GB of HBM2 bandwidth, vs 1TB for Vega 20, not a huge difference. Volta PCIE also requires only 250W. Less than Vega 20 by 50W. It's more power efficient even on the 12/16nm node. At 7nm it will consume far less power at same clocks (125w?) and has it's size shrunk from 815 to at most 600mm2.
    That's an inaccurate generalization. Vega 10 is more wide, has bigger area and lower clocks, yet it consumes far more power than GP104, which is clocked to the max. You are excluding power efficiency for a given architecture, which is the more determining factor really.
    We are not talking about a situation where the clock is doubled here.
    I am assuming you are using the lower clocked PCIE V100, right?
    If you base your numbers on the NVLink version of V100, the Volta value should be 15.7 TF.
     
    #5691 DavidGraham, Nov 9, 2018
    Last edited: Nov 9, 2018
  12. SpaceBeer

    Newcomer

    Joined:
    Apr 15, 2017
    Messages:
    28
    Likes Received:
    14
    Location:
    The Balkans
    If 14nm -> 7nm die didn't bring significant power savings for AMD, it won't for nVidia either. Especially if TSMC's 16/12nm is already better than GloFo's 14nm. Ie. if nVidia clocks their 7nm chips 10-15% higher than Volta/Turing, they will also consume 250-300W
     
    w0lfram likes this.
  13. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,426
    Likes Received:
    357
    Which is exactly why I stated just that in my post.

    Except the comparison is against multiple Vegas with 2-3x or more aggregate bandwidth. Your whole power argument is plain silly because of the curves and parallel nature of the workload. Could probably make 10 Vegas use less power than Volta with more performance if you wanted. Doubt that's cost effective unless memory bandwidth crucial.

    Again, power curves which was the entire point of my post. Dial it back a bit and more FLOPs for less power. Not even accounting for node or other factors. Which again as I stated in my post need to be taken into account.

    Sure we are when you consider power curves and the theoretical model. Cut the clocks in half and power plummits. Perf/watt skyrocketing in the process. Did I mention power curves? Just want to make sure the premise of my post wasn't misunderstood.
     
  14. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,285
    Likes Received:
    1,434
    Of course It does bring savings to Vega 20, problem is these savings are offset by the clock increase, IO additions and extra chip features. Things Volta already paid for on 12nm.
    We are not arguing theoreticals here, but actual implementations. I can claim you can make 20 Voltas consume 100w. Doesn't mean it's true, or it's doable in any practical or useful manner.
    And that's your problem right there, you are talking in a vacuum, consider other factors like arc power efficiency, nodes, features .. etc. And your power curve point suddenly becomes moot. As it actually applies to all architectures.
     
  15. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,426
    Likes Received:
    357
    If the clocks we're adjusted, that would be the implementation. No requirement to run the cards at stock settings. 20 Voltas probably won't work with the power floor, but yes the same analogy would work if Voltas were significantly cheaper than the Vegas. Last I checked they weren't as we're comparing a die a fraction of the size.

    I did consider them in the last two posts I made and explicitly pointed it out. Not sure I see the problem other than you missing the comparison.
     
  16. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,731
    Likes Received:
    1,457
    Location:
    Finland
    Vega 10 is clocked and using voltages far closer to actual limit of the chip (which is combination of architecture and process) than any model of GP104. They don't use equal processes either.
    For what it's worth, many users actually achieve higher performance and lower consumption with their Vegas by simply lowering the voltage a tad
     
    w0lfram likes this.
  17. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,285
    Likes Received:
    1,434
    That's chip lottery. Also it doesn't really guarantee 100% stability across all workloads.
    On the other spectrum, many GP104 users are running their cards @2.1GHz, getting extra performance while still consuming far less power than Vega.
    I don't get that sentence, please elaborate.
     
  18. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    1,364
    Likes Received:
    226
    Location:
    USA, CA
    Tensor core fp32 accumulation on volta and quadro turing is full speed. 2080ti is half speed and that's probably where confusion comes from. Turing also has 8bit and 4bit tensor cores.

    https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/
     
  19. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,125
    Likes Received:
    3,762
    Some interesting tidbits from this video, about Vega 20.

    Bandwidth is better explained at ~~22m00s
    GPU-to-GPU total bandwidth is 200GB/s from the IF link, plus 64GB/s through the 16 PCIe 4.0 lanes, so a total of 264GB/s.
    He claims the latency of Vega 20 A to access Vega B's memory pool is 60-70ns, and one "complete loop" between neighboor GPUs is 140-170ns.
     
    BRiT and Lightman like this.
  20. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,731
    Likes Received:
    1,457
    Location:
    Finland
    Every chip has a limit on how high it can clock. Once you get closer to it the voltage requirements start to ramp up rapidly and the power consumption grows exponentially. Limit depends on both the architecture and the process the chips is built on (as well as individual variation between each chip). (also, GP104 and Vega 10 are not built on same or equal processes)

    Vega models are clocked and are using voltages far closer to actual limits of the chip and chosen so that every card should be able to meet the advertized clockspeeds at pre-set situation even if it means using higher voltage on many of the cards. Being close to it's limits, lowering the voltage and/or clocks a bit makes a huge difference on power consumption here.
    This behaviour is nicely demonstrated for example in TPU's Vega 64 review (https://www.techpowerup.com/reviews/AMD/Radeon_RX_Vega_64/)
    Using (primary) Balanced profile has GPU consumption limit at 220W, Turbo-profile at 253W and Power Saver -profile at 165W. Vega being already so close to it's limits, you can reach about 1 % higher performance with 15 % higher consumption. On the other hand for the very same reason, using Power Saver -profile cuts your performance by only 4 % while your power limit goes down by 25 % (in other words losing 4 % performance gives you 33% higher energy efficiency). No card running in a "comfortable range" would have such extreme differences between the profiles

    NVIDIA on the other hand has had the luxury to be more moderate with their clocks and thus voltage, they had a lot of headroom on both the clocks and the voltage to go higher, but they didn't need to. Being in more comfortable, dare I even say optimal clockrange for the chip, lowering the voltage and/or clocks a bit makes a smaller difference here.

    Just like GP104 performance even out of the box, but I don't think I've heard a single one that didn't benefit from lowering voltage.
     
    #5700 Kaotik, Nov 9, 2018
    Last edited: Nov 9, 2018
    beyondtest, Ethatron and AlBran like this.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...