Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. bdmosky

    bdmosky Newcomer

    Marble Madness 2020 confirmed!
     
  2. del42sa

    del42sa Newcomer

  3. gamervivek

    gamervivek Regular

    Looking at AT's article, single precision is 19.5TF up from 15.7TF on V100 and double precision is 9.7TF from 7.8TF. The boost clock is down by about 100Mhz. How much different the gaming chip would have to be considering these changes look anemic for the node jump.
     
    Tarkin1977 likes this.
  4. del42sa

    del42sa Newcomer

  5. DavidGraham

    DavidGraham Veteran

    You are looking at the wrong metrics, they expanded Tensor functionality to 32 bit, where they achieved 156TF/312TF, this is an AI optimized chip, and should be treated accordingly.
     
  6. fellix

    fellix Veteran

    So, the current crop of A100 GPUs disables one of the eight GPCs to gain yealds (plus few extra SMs), that's why MIG virtualization is limited to 7 partitions?!
     
  7. gamervivek

    gamervivek Regular

    You are looking at the wrong comment then. I'm talking of gaming chip which is supposedly Ampere too, unless tensor cores are being used in normal shader pipeline, the gaming Ampere chip would be drastically different.
     
  8. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■) Moderator Legend Alpha

    nnunn and Man from Atlantis like this.
  9. manux

    manux Veteran

    Looks like BMW has chosen nvidia robotics platform to be used in their factories
    bmw.png

     
    Lightman likes this.
  10. CarstenS

    CarstenS Legend Subscriber

    Possibly one memory partition as well, since there are six symmetric places for HBM2 stacks - one of them being a dummy with 40 GBytes per SXM.
     
  11. xpea

    xpea Regular

    Official Ampere deep dive from Nvidia:
    https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/
     
    pharma, tinokun, nnunn and 4 others like this.
  12. Kaotik

    Kaotik Drunk Member Legend

    Can you test the GPU before assembly when using CoWoS? If not, there might not be any dummies but instead they just bin the fully working ones to release them at later date
     
  13. CarstenS

    CarstenS Legend Subscriber

    So, no one else feeling a little baffled about the die size and transistor count? 54 bln. x-tors in 826 mm² is more than 1,5x the density AMD gets with (the albeit much smaller) Navi 10 and Vega 20.

    Here, Jensen mentions 70% more transistors (indirectly referencing Volta), which would put it at 35.7 bln x-tors and a much more plausible 43,2M x-tors/mm².
     
    Kej, Lightman, hurleybird and 4 others like this.
  14. Benetanegia

    Benetanegia Regular

    I really think he's talking about the new tensor cores though.

    The density of the entire chip doesn't surprise me one bit, as I've discussed several times in the forums. It's what I would expect from a node that is claimed to be 3x denser... It's always been relatively close to what the foundry claimed before 7nm, why would it be different now. It's always been AMD's denisty that did't make any sense.
     
    pharma, Qesa, disco_ and 1 other person like this.
  15. DavidGraham

    DavidGraham Veteran

    Interestingly, RT cores are omitted from the Tesla Ampere variants, the same way display connectors and NVENC encoder are omitted. Which means NVIDIA will highly customize their GPUs this time around.

    Yes indeed, I expect them to ditch a lot of the tensor cores in the gaming chips, also FP64 will be gone too, in addition to a lot of the HPC silicon. RT cores will be back.
     
  16. pcchen

    pcchen Moderator Moderator Veteran Subscriber

    625TFLOPS per GPU is actually its FP16 tensor core numbers (edit: with sparse matrix optimization).
     
    pharma likes this.
  17. ShaidarHaran

    ShaidarHaran hardware monkey Veteran

    [​IMG]

    Thinking ahead to consumer parts, obviously the FP64 cores will go bye-bye, but I don't see how they can cut back on the tensor cores with this SM architecture in a way that saves die space. But it looks like load/store throughput and L1 cache has doubled compared to Turing SM, so that should lead to some IPC gains.

    I'm guessing we'll see in the range of 84-90 SMs (5376-5760 FP32 Cuda cores) for GA102,
    320-384 bit crossbar memory controller with 20-24GB GDDR6,
    ditch most of the NVlink connections and add in RT cores,
    should give us a die size in the 600-650mm^2 range.
     
    Last edited: May 14, 2020
    pharma and nnunn like this.
  18. fellix

    fellix Veteran

    Double the L/S units, but still just one TMU quad. Probably that SM layout is not conclusive for the consumer parts, particularly regarding the increased RT performance.
     
  19. ShaidarHaran

    ShaidarHaran hardware monkey Veteran

    GA100 SM doesn't even have RT cores, so GA102 will need to incorporate them.
     
  20. DavidGraham

    DavidGraham Veteran

    Tensors occupy a third of the SM now, which is a massive increase over Volta and Turing, so they will be restructuring the SM to add in RT cores and minimize Tensor space for the consumer chips.
     
Loading...

Share This Page

Loading...