Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. bdmosky

    Newcomer

    Joined:
    Jul 31, 2002
    Messages:
    172
    Likes Received:
    32
    Marble Madness 2020 confirmed!
     
  2. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    184
    Likes Received:
    107
  3. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    747
    Likes Received:
    244
    Location:
    india
    Looking at AT's article, single precision is 19.5TF up from 15.7TF on V100 and double precision is 9.7TF from 7.8TF. The boost clock is down by about 100Mhz. How much different the gaming chip would have to be considering these changes look anemic for the node jump.
     
    Tarkin1977 likes this.
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,359
    Likes Received:
    3,732
    You are looking at the wrong metrics, they expanded Tensor functionality to 32 bit, where they achieved 156TF/312TF, this is an AI optimized chip, and should be treated accordingly.
     
  5. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,515
    Likes Received:
    441
    Location:
    Varna, Bulgaria
    So, the current crop of A100 GPUs disables one of the eight GPCs to gain yealds (plus few extra SMs), that's why MIG virtualization is limited to 7 partitions?!
     
  6. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    747
    Likes Received:
    244
    Location:
    india
    You are looking at the wrong comment then. I'm talking of gaming chip which is supposedly Ampere too, unless tensor cores are being used in normal shader pipeline, the gaming Ampere chip would be drastically different.
     
  7. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,826
    Likes Received:
    16,611
    nnunn and Man from Atlantis like this.
  8. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,220
    Likes Received:
    1,113
    Location:
    Earth
    Looks like BMW has chosen nvidia robotics platform to be used in their factories
    bmw.png

     
    Lightman likes this.
  9. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,102
    Likes Received:
    2,572
    Location:
    Germany
    Possibly one memory partition as well, since there are six symmetric places for HBM2 stacks - one of them being a dummy with 40 GBytes per SXM.
     
  10. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    421
    Likes Received:
    461
    Official Ampere deep dive from Nvidia:
    https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/
     
    pharma, tinokun, nnunn and 4 others like this.
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,238
    Likes Received:
    3,182
    Location:
    Finland
    Can you test the GPU before assembly when using CoWoS? If not, there might not be any dummies but instead they just bin the fully working ones to release them at later date
     
  12. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,102
    Likes Received:
    2,572
    Location:
    Germany
    So, no one else feeling a little baffled about the die size and transistor count? 54 bln. x-tors in 826 mm² is more than 1,5x the density AMD gets with (the albeit much smaller) Navi 10 and Vega 20.

    Here, Jensen mentions 70% more transistors (indirectly referencing Volta), which would put it at 35.7 bln x-tors and a much more plausible 43,2M x-tors/mm².
     
    Kej, Lightman, hurleybird and 4 others like this.
  13. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    343
    Likes Received:
    308
    I really think he's talking about the new tensor cores though.

    The density of the entire chip doesn't surprise me one bit, as I've discussed several times in the forums. It's what I would expect from a node that is claimed to be 3x denser... It's always been relatively close to what the foundry claimed before 7nm, why would it be different now. It's always been AMD's denisty that did't make any sense.
     
    pharma, Qesa, disco_ and 1 other person like this.
  14. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,359
    Likes Received:
    3,732
    Interestingly, RT cores are omitted from the Tesla Ampere variants, the same way display connectors and NVENC encoder are omitted. Which means NVIDIA will highly customize their GPUs this time around.

    Yes indeed, I expect them to ditch a lot of the tensor cores in the gaming chips, also FP64 will be gone too, in addition to a lot of the HPC silicon. RT cores will be back.
     
  15. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,850
    Likes Received:
    285
    Location:
    Taiwan
    625TFLOPS per GPU is actually its FP16 tensor core numbers (edit: with sparse matrix optimization).
     
    pharma likes this.
  16. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,007
    Likes Received:
    60
    [​IMG]

    Thinking ahead to consumer parts, obviously the FP64 cores will go bye-bye, but I don't see how they can cut back on the tensor cores with this SM architecture in a way that saves die space. But it looks like load/store throughput and L1 cache has doubled compared to Turing SM, so that should lead to some IPC gains.

    I'm guessing we'll see in the range of 84-90 SMs (5376-5760 FP32 Cuda cores) for GA102,
    320-384 bit crossbar memory controller with 20-24GB GDDR6,
    ditch most of the NVlink connections and add in RT cores,
    should give us a die size in the 600-650mm^2 range.
     
    #37 ShaidarHaran, May 14, 2020
    Last edited: May 14, 2020
    pharma and nnunn like this.
  17. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,515
    Likes Received:
    441
    Location:
    Varna, Bulgaria
    Double the L/S units, but still just one TMU quad. Probably that SM layout is not conclusive for the consumer parts, particularly regarding the increased RT performance.
     
  18. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,007
    Likes Received:
    60
    GA100 SM doesn't even have RT cores, so GA102 will need to incorporate them.
     
  19. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,359
    Likes Received:
    3,732
    Tensors occupy a third of the SM now, which is a massive increase over Volta and Turing, so they will be restructuring the SM to add in RT cores and minimize Tensor space for the consumer chips.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...