Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

  1. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    I stopped to be a true believer, when everyone removed AA-samples/s from their spec sheets. [/sarcasm]
     
    Lightman, Rootax and TheAlSpark like this.
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    In the big slide from press deck just above your post?
    I'm fully aware of how they came up with the numbers. Like I said, I should have been more specific and said "in their slides", which at minimum would need some fineprint saying "check details from here since this is actually BS if you just read it like it is"

    But TF32 isn't FP32, that was the point.
    The other comparisons are a little iffy too, at least to my understanding that FP64-number holds true only if it's matrix multiplications (aka something tensors run fast), same for INT8 (which in addition assumes you can take advantage of Sparsity support)
     
    #322 Kaotik, Jun 23, 2020
    Last edited: Jun 23, 2020
  3. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    475
    Likes Received:
    196
    Which is why they only list fp64 as 10 teraflops, and not a huge number. Nvidia gonna Nvidia, whatever.
     
  4. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    Except in that slide I posted, where they list it as 19.5 TFLOPS, which is tensor cores doing tensor stuff at FP64 precision.

    edit:
    The slide says
    FP32 312 TFLOPS
    INT8 1248 TOPS
    FP64 19.5 TFLOPS

    When it really should say
    FP32 19.5 TFLOPS
    INT8 624 Tensor TOPS (1248 with Sparsity support)
    FP64 9.7 TFLOPS
     
  5. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    OTOH, that column is headed by „peak“, and traditional FP32/64 numbers are given for FMAs only as well, so you just used to those, albeit they tell you just another peak value. MUL, ADD or - especially, beware - DIV, SQR and POW are losely connected to this. I'd rather question the identical peaks with 150 watts less power envelope compared to the SXM4 model, so the peaks could be massively shorter in duration or depending on instruction mix, a purely theoretical number.
     
  6. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,336
    Likes Received:
    297
    The "problem" isn't that they are peak numbers, but that those are not FP32 numbers, but TF32 numbers labeled as FP32.
     
    techuse and Kaotik like this.
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    For FP32, that's on top, sure. But Kaotik seems to dislike more than that.
     
    pharma likes this.
  8. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    384
    Likes Received:
    389
    Omitting that you can neither achieve the conditions for peak performance due to functional constraints in most use cases, nor sustained operation would fit within the TDP budget, and on top of that also wildly re-interpreting terms to include operations-not-performed in the stated peak numbers, and deliberately mislabling data types.

    It's not that bad compared to the one time where NVidia had the nerve to label bitwise operations on 32bit operands followed by popcnt as "33 1-bit FLOPs". But the numbers are useless regardless.
     
    Lightman, PSman1700, Krteq and 4 others like this.
  9. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,528
    Likes Received:
    2,215
    THE NEW GENERAL AND NEW PURPOSE IN COMPUTING
    June 29, 2020

    [​IMG]

    [​IMG]

    https://www.nextplatform.com/2020/06/29/the-new-general-and-new-purpose-in-computing/


    Edit: The article is a good read and has a bit more information.
     
    #329 pharma, Jun 30, 2020
    Last edited: Jun 30, 2020
    PSman1700 and xpea like this.
  10. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    745
    Likes Received:
    317
    Some speculation about the top consumer Ampere GPU.
    The A100 went from the V100 from 6 GPC with 14 SM / GPC to 8 GPC with 16 SM / GPC.
    Or from 84 SMs / 5.376‬ FP32 Cores to 128 SM / 8.192 FP32 Cores
    Logically the GA102 will also have 8 GPCs, and keeping 12 SM / GPC that would lead to
    the TU102 going from 6 GPC with 12 SM / GPC to a GA102 with 8 GPC and 12 SM / GPC
    Or from 72 SMs / 4608 FP32 Cores to 96 SMs / 6144 FP32 Cores
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    I wouldn't be drawing conclusions that Volta > Ampere jump relates in any way to gaming sides Turing > Ampere, especially when we take into account that A100 and gaming Amperes will be quite a bit different, with the former lacking RT-acceleration and most likely dedicating more space to Tensors
     
    DegustatoR likes this.
  12. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    745
    Likes Received:
    317
    You are quite wrong about that, the V100 was a good predictor for the TU102, with the latter gaming GPU having even more space dedicated to tensors per SM. Likewise the A100 will serve as a basis for the GA102.
    I would hate seeing those big tensor cores transition from the A100 to the GA102, but given the past it would not be a complete surprise.
     
  13. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    Volta and Turing weren't the same architecture, Amperes should be even if the implementations are different in similar way. You can't just pick few nice indicators and ignore the rest of the differences and call it a day.
     
  14. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    745
    Likes Received:
    317
    I was speculating about the number of GPCs and SMs in a plausible way.
    Besides that there will be obvious differences like there always have been like no double precision, much less NV links, reduced ECC / memory system...
    If 96 SMs for a GA102 is unreasonable to you, argue about that.
     
  15. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    I think it all depends on whether or not Gaming-Ampere will also get the beefy tensor cores. Since the chip specialties in training and inference seem to diverge a bit, I can see Nvidia offering the training-optimized µarch throughout the product line, thus leaving more space for simpler cores with high throughput only in traditional applications and inferencing.
     
  16. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
  17. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    243
    Likes Received:
    102
    Ampere leak if it's true, it will come with 8nm (better 10nm from Samsung)

     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,070
    Likes Received:
    2,942
    Location:
    Finland
    This again o_O
    Jensen specifically said sometime after they announced the new Samsung deal that TSMC is still going to make most of their chips. There's nothing indicating the Samsung deal would be somehow bigger than the last one, where they made lowend Pascals
     
  19. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,934
    Likes Received:
    2,263
    Location:
    Germany
    Did Jen-Hsun's statement (which I don't recall, hence my question) specify, if he meant the absolute number of produced chips (in millions) or the number of models (GA100, GA102, GA104) etc.? Samsung could be making GA102/104, while TSMC churns out GA 106/107/108.
     
  20. Putas

    Regular Newcomer

    Joined:
    Nov 7, 2004
    Messages:
    425
    Likes Received:
    89
    It is Samsung fabs that focus on smaller low-power chips.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...