Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. ethernity

    Newcomer

    Joined:
    May 1, 2018
    Messages:
    100
    Likes Received:
    240
    Glad lot of memes were put to rest
    - Gaming Ampere 7nm
    - NVCache
    - RT Coprocessor on the back
    - DLSS 3.0
    - Tensor memory compression
     
    disco_, Lightman, no-X and 2 others like this.
  2. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    Call me skeptical but I'm doubtful tensor cores can offer the same decompression performance as dedicated, fixed function units. Not to mention varying performance across the GPU lineup (How will GA106/107 do especially)

    But yes, independent benchmarks are required of course.
     
  3. PSman1700

    Legend Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    5,043
    Likes Received:
    2,242
    No idea why theres a need for going personal.
    Theres a rather huge uplift going from a 2080 to a 3080, a 80% increase in performance as per DF testing in pure rasterization. Which means the doubling of TF fits from 2080 to 3080. Thats aside from all other improvements, but they also didnt go haywire with the prices (even though i still think their too high).
    Where in a range of 20 to 36TF of performance, i think thats quite the leap, even a 2080Ti looks kinda old by now.
     
    #1083 PSman1700, Sep 2, 2020
    Last edited: Sep 2, 2020
  4. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    8,700
    Likes Received:
    3,200
    Location:
    Guess...
    Yeah Moores Law is Dead really got shown up on this one. Virtually everything he predicted was wrong. I still hold out hope for a DLSS 3.0 at some point though that will be more game agnostic. There's no real reason why it would need to be tied to the hardware launch and might actually be more useful to unveil at a later date. Say a couple of days before the RDNA2 launch. I do realise of course that this is likely just wishful thinking.
     
  5. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    439
    Likes Received:
    508
    And for what result ?
    When this gen of consoles will hit the market, they will be equivalent to a entry/mid level gaming PC (RTX3060 + Zen3). In 2 years, it will barely match an entry level PC (RTX40 + Zen4). And in 4 years / mid life of this gen, everyone will complain about how slow are these consoles and how they cripple gaming PC with low tech ports and outdated tech... Rinse and repeat until Sony/MS announce next gen consoles...
    True is that consoles can't create any sufficient tech leap to be technically relevant for their entire life time. PC gaming is always ahead, no matter how you look at it (except when consoles are announced and compared to last gen hardware)
     
  6. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    439
    Likes Received:
    508
    For the record, DLSS3.0 is on the way, I saw it. NV just launch DLSS 2.1 SDK because some 3.0 features are not ready yet. But DLSS 3.0 is well alive and coming.
    8/7nm is a crazy story. I saw until last week NV launch slides from their sales team with 7nm on it !!!
     
    Lightman, LeStoffer, sonen and 5 others like this.
  7. PSman1700

    Legend Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    5,043
    Likes Received:
    2,242
    3070 sits at 20TF, no idea where the 3060 will be though, but most likely well above consoles, aside from dlss/RT advantages.
    Zen3 seems intresting, amd does a great job with their cpus now.
     
  8. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    8,700
    Likes Received:
    3,200
    Location:
    Guess...
    Assuming it is done on the tensor cores (which makes a lot of sense) you're looking at over 50 TFLOPS of FP16 and over 100 TOPS of INT8 on an RTX 2060 alone. I don't see it as unrealistic that that would enable it to match or exceed what is likely a cheap hardware decompression block.

    Also, since Jenson specifically said they could exceed the output of a 7GB/s NVMe drive it would mean that he was outright lying as opposed to just presenting some slightly misleading presentation material.
     
    pharma, DavidGraham and PSman1700 like this.
  9. Janne Kylliö

    Newcomer

    Joined:
    Oct 10, 2019
    Messages:
    53
    Likes Received:
    44
    I don't know... It seems that the RTX 3080 FE has roughly 3 times the theoretical performance of the RTX 2080 FE (30TF vs. 10.6TF), but manages only outperform it by 80% in real gaming benchmarks. Maybe it's a bandwith issue, maybe a scheduling issue, who knows?

    But it also means that if the XSX GPU has performance advantage over the 2080, the performance difference between the XSX and the 3080 won't be close to 2X either. And the rumoured 80 CU Big Navi might actually match or exceed the 3080 performance (non-DLSS, non RTX).

    However, it remains to see if AMD will provide similar functionality to DLSS 2.0 or have RTX performance as good. If not, then the NV is a clear winner here.
     
    London Geezer likes this.
  10. troyan

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    352
    Likes Received:
    688
    IPC doesnt go down. Ampere can process more instruction than Turing...
     
    Scott_Arm, pharma and PSman1700 like this.
  11. Leoneazzurro5

    Newcomer

    Joined:
    Aug 18, 2020
    Messages:
    230
    Likes Received:
    259
    The right term here may be "utilization"
     
    BRiT, pharma, DegustatoR and 4 others like this.
  12. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    27
    Likes Received:
    46
    I'm assuming "IPC" was intended to mean "realised performance per peak fp32 throughput"
     
  13. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    It seems that finally, after all these years, Nvidia may have the price/performance cards capable of taking on their greatest adversary that has haunted and thwarted them all this time...
    ...
    ...
    ...
    2018 $350 cryptomine liquidation 1080TI.
     
    Lightman likes this.
  14. troyan

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    352
    Likes Received:
    688
    Yes, but does it matter? Transistors are cheap, power consumption isnt. Why not fill the whole die with fp32 units?
     
    neckthrough likes this.
  15. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    27
    Likes Received:
    46
    I didn't mean to put any value judgement on whether decreasing "IPC" (or as leonazzuro suggests, utilisation is a far better term) is good or bad. Simply trying to point out why the added SIMD would make it go down.
     
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,405
    Likes Received:
    1,941
    Location:
    msk.ru/spb.ru
    That's doubtful as well so far.
    "AMD flops" had inherent utilization issues in how GCN scheduling worked, no matter how much math you were pushing at them there were issues with wavefront widths and context switching bubbles.
    Ampere flops may look kinda similar from utilization point of view on older s/w but the actual reason for that can be their underutilization due to s/w being limited by some other part of the pipeline (rasterization, bandwidth, etc) not because the Ampere multiprocessors are having issues with keeping the FP32 units utilized. This means that Ampere's FP32 utilization may be the same as on Turing and RDNA1/2 on the code which is predominantly FP32 limited - and this will likely be exactly the type of code where performance will matter the most.
    We have to see the details on the reorganized Ampere SMs and how they reach the 2x FP32 on them.

    I'm sure that DLSS 3.0 is coming. I'm also sure that it won't be anything like what MLID was talking about.
     
    sonen, Krteq and PSman1700 like this.
  17. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,584
    Likes Received:
    4,310
    Seems NVIDIA caught wind of the Xbox Series X methodology of calculating RT performance, Microsoft said that the RT acceleration of Series X is equivalent to 13TF of compute, for a total of 25TF of compute across both the shaders and RT cores.

    Jensen took the hint and declared that according to that Xbox Series X methodology, the 2080 RT cores have the equivalent of 34TF of compute, in addition to another 11TF of compute, for a total of 45TF while ray tracing, which is 80% faster than Series X.

    For Ampere, the 3080 alone delivers the equivalent of 58TF from the RT cores, not taking into account the other 30TF of regular compute, which amounts to a crazy 88TF while ray tracing.
     
    #1097 DavidGraham, Sep 2, 2020
    Last edited: Sep 2, 2020
    egoless, disco_, function and 3 others like this.
  18. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    Well, now we have 13,45 Tflops (2080TI) behaving like 20 Tflops (3070). So flops are not equal, and a possible 20 tflops RDNA2 now will trounce a 20 tflops 3070.
     
  19. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    6,605
    Likes Received:
    7,134
    Location:
    Barcelona Spain
    A dev CorralX on another forum estimate the efficiency to only around 10% with DX12 and Vulkan on PC side.

    https://twitter.com/Corralx
     
  20. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,405
    Likes Received:
    1,941
    Location:
    msk.ru/spb.ru
    2080Ti actual boost flops are closer to 16.5 TFs.
    3070 is said to be "faster" than 2080Ti, not "like" it.
    So yeah flops may well end up being equal and the seemingly lower utilization may be a result of bandwidth or some other limitations coming into play on older s/w.
    Will 20 tflops RDNA2 card "trounce" the 20 tflops 3070? Possibly, sometimes. Universally? Doubtful. And I'm not even accounting for DLSS here.
    Note that I expect Navi 21 to be higher than 20 tflops in actual shipping products. This one will likely be universally faster than 3070, of course.
     
    pharma and PSman1700 like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...