Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Discussion in 'Architecture and Products' started by Geeforcer, Nov 12, 2017.

Tags:
Thread Status:
Not open for further replies.
  1. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,359
    Likes Received:
    3,732
    Sounds like HPC Ampere is a 128CU, another chip just showed up, with 7936 CUDA cores.

     
    Newguy, xpea, DegustatoR and 2 others like this.
  2. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,359
    Likes Received:
    3,732
    More chips are appearing, this time we have a 118CU chip:



    The 124CU chip @1100MHz achieves a CUDA score of: 222337
    The 118CU chip @1100MHz achieves a CUDA score of: 169368

    So going from 118 to 124 resulted in an increase of 31%!

    Also worth noting that these results are not comparable to CUDA scores for Tesla or Turing, as the tests for Ampere are using insider drivers and CUDA 11 which is yet to be released.

     
    Newguy and pharma like this.
  3. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,844
    Likes Received:
    4,016
    Location:
    Pennsylvania
    Yeah because that makes sense. No other factors are involved of course.

    Oooh Nvidiaaaaaaaa!
     
    ethernity likes this.
  4. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,102
    Likes Received:
    2,572
    Location:
    Germany
    FWIW, a 2080 Ti @1100 MHz is around 106k in GB5.
     
  5. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    343
    Likes Received:
    308
    I think it's interesting that if the clocks are not being missrepresented, those benchmark results are pretty consistent with previous rumors/leaks.

    First, the higher than expected performance-per-SM is consistent with the proposed setup of 16xFP32 + 16xINT or 16xFP32 + 16xFP32, and I'd even say that by the expected amount. Turing would be effectively ~1.4x warps every 2 cycles (36 INT per 100 FP thing), while Ampere would actually be able to do the full 2 warps, which is a 40% perf increase, which we are kinda actually seeing in those benches.

    Second, 50% higher performance is roughly what we are shown and I think with such low clocks on 7nm and the afforementioned setup, half the power consumption is pretty realistic, almost a given to be very low.
     
    DavidGraham, xpea and PSman1700 like this.
  6. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,726
    Likes Received:
    2,573
    Yea, I know it must hurt so much!
     
  7. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,664
    Likes Received:
    476
    Location:
    msk.ru/spb.ru
    Could be just as easily due to memory bandwidth changes.
     
  8. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    343
    Likes Received:
    308
    You think? It's the same memory setup as Volta except for clocks, and perf on Volta vs Turing is more consistent with TFLOPS than it is with Volta's 50% higher memory BW. In this particular benchmark anyway.
     
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,664
    Likes Received:
    476
    Location:
    msk.ru/spb.ru
    I'm just saying that it's hard to tell from these results. L2 is 5X+ larger, memory size varies from 24 to 32 to 48 GBs so it's hard to say if it's even four stacks and not six now for example.

    Also how does Geekbench count the number of SPs? Shouldn't it detect the proper number if it's 2X per SM now? Or does it just use some fixed number per SM and multiply it by SM count?
     
    #589 DegustatoR, Mar 4, 2020
    Last edited: Mar 4, 2020
  10. Benetanegia

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    343
    Likes Received:
    308
    Yeah, I get what you mean, but I still think there's more to it.

    As far as I can tell, the bench doesn't count the number of SPs at all. It only reports number of CUs.
     
  11. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,102
    Likes Received:
    2,572
    Location:
    Germany
    Since it uses OpenCL and CUDA, it can read it directly from what the driver reports.
     
    Man from Atlantis likes this.
  12. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,771
    Likes Received:
    905
    Location:
    New York
    That would put an Ampere SM at 15% higher IPC than Turing. We're looking at the same alu config + better caches I think.
     
    #592 trinibwoy, Mar 4, 2020
    Last edited: Mar 4, 2020
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,771
    Likes Received:
    905
    Location:
    New York
    Those frameworks report the number of SMs not ALUs correct?
     
    CarstenS likes this.
  14. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    237
    Likes Received:
    40
    AND...? (is it not obvious?)

    Navi10 is faster in games, not equal in games. That is why nVidia released SUPER. (that still can't compete in some games). Understand?


    Secondly, navi-10 uses a hybrid design rdna(1), and the long held secret of rnda2 (AMD's full new gaming architecture) has yet to be seen. But we know it is not weighed down with gcn, or what limited that design... it free from that.

    TU-106 equalized (for mhz, transistors, etc) can't beat hybrid navi, then how will it compete whit rdna2's gaming efficiency ? We are talking about a uArch, that AMD has been working on (in secret) for 3 years and once shown to Clients years ago, jumped on board. Both the new Xbox & PlayStation will be using rdna2, not to mention we've seen some of the specs. Xbox might have the gaming performance of the rtx2080, using rdna2.

    You might want to stop on let that sink in.


    Thridly, Ampere is not new, it's architecture is 100% based off of Turing, just further refined. You are fabricating. And yes, on a full node shrink, that is totally new to nVidia. They will have growing pains.

    But you still have not refuted the fact that rdna(1) is more powerful (at gaming) than turing architecture. And more of nvidia's design (turing 2.0?), is not going the help in games, because (again) it's based on an antiquated design.

    I don't want a bigger 2080 shrink down, I want a revolutionary one. And this ampere, as we know it, is only nvidia's next volta business sector dGPU.
     
  15. naenrda

    Joined:
    May 21, 2019
    Messages:
    4
    Likes Received:
    5
    Navi isn’t faster in games, it’s roughly on par with Turing while having less features. Also, please don’t peddle this “Navi/RDNA is a hybrid” that came from the lowest of the low tech publications...

    RDNA2 will be a solid improvement but nothing revolutionary.
     
    Cuthalu and PSman1700 like this.
  16. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,629
    Likes Received:
    1,001
    Location:
    France
    w0lfgram keeps raising the bar... I'm waiting for "and oh BTW, Vega was trashing Turing too, you know"
     
  17. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    237
    Likes Received:
    40
    full stop. I just proved my case above, there is no argument here.
    If navi10 and TU-106 are the same size and navi-10 is on average +15% faster... how is it not more efficient. More performance ("ipc"), using less transistors..? No matter how you scale it, rdna(1) comes out on top for freq vs output.

    And, I am not peddling anything. rdna2 is different, period!
     
  18. JasonLD

    Regular

    Joined:
    Apr 3, 2004
    Messages:
    415
    Likes Received:
    57
    Fact that 445mm2 12nm GPU is competing against 251mm2 7nm2 not only in performance but efficiency pretty much ends the argument period. Its AMD that needs to catch up, not the other way.

    You are completely ignoring 7nm advantage vs 12nm on same number of transistors. Without that advantage on Performance/Power Savings, Navi wouldn't look pretty against Turing.
     
    #598 JasonLD, Mar 6, 2020
    Last edited: Mar 6, 2020
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,771
    Likes Received:
    905
    Location:
    New York
    Not really. You're just repeating unfounded statements over and over. That doesn't amount to proof.

    How did you calculate the transistor count? How many transistors did you allocate to tensors and RT?

    Instead of random guessing why don't you compare the 5700xt and 2070 super. They literally have the same specs.
     
    Cuthalu, pharma and Rootax like this.
  20. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    354
    Likes Received:
    190
    I dont necessarily agree with Wolfram overall but hes correct in terms of transistor counts. 5700xt is 10.3 billion. The 2070 super is 13.6 billion.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...