Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    From this video, an interesting table from the (unreleased?) white paper:
    Untitled.png
     
  2. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    961
    Likes Received:
    855
    So nvidia disabled a complete GPC of GA102 for RTX 3080 and almost doubled L1$/SharedMemory compared to Turing

    [​IMG]
     
    #1562 Man from Atlantis, Sep 16, 2020
    Last edited: Sep 16, 2020
    Lightman, Jawed and pharma like this.
  3. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    P100: 64+64 KiByte L1/SMEM
    V100: 128 KiByte L1/SMEM
    TU10x: 96 KiByte L1/SMEM
    A100: 192 KiByte L1/SMEM
    GA10x: 128 KiByte L1/SMEM

    edit: Per SM.
     
    Lightman, Malo, Krteq and 2 others like this.
  4. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.
     
  5. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.
     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    The white paper says "non-Tensor" for both FP32 and FP16. Not sure if that has anything to do with which ALUs they run on.
     
  7. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    It's non-tensor but AFAIK all FP16 - including non-tensor math - was running on TC hardware on Turing. I dunno how it is now with Ampere.
     
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Not unreleased, it's available for us in the media field. Nothing in the whitepaper is NDA'd but you're not allowed to release whole whitepaper as is (guess they'll bring it out later for everyone)
     
  9. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    256
    Likes Received:
    364
    If you look at it FP16 TF is the same 1/4 ratio to FP16 Tensor TF with Turing and Ampere.

    It also looks like the Tensor Cores in RTX 3080 might be only 1/2 (and 1/4) rate compared to the ones in A100 versus 1:1 (and 1/2) for Turing Gaming vs. Pro/V100.
     
  10. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.

    I wonder if this will even affect anything in practice really. FP16 RPM was hyped to hell back at Vega and PS4Pro launch and hasn't really manifested itself much in any performance since then.
     
    PSman1700 likes this.
  11. FP16 wasn't hyped at all for the PS4 Pro. IIRC the only mention of RPM in the Pro from Sony you'll ever find is Cerny casually mentioning it during a DF interview.

    For the Vega it was indeed hyped, though it happened during Raja's reign where RTG marketing was.. different.
     
    Lightman and egoless like this.
  12. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    Yeah and this lead to people everywhere saying that PS4Pro is actually twice the teraflops and such stupid stuff.
     
    PSman1700, egoless and BRiT like this.
  13. It is, though. At maximum FP16 throughput.
     
  14. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    256
    Likes Received:
    364
    There's a lack of software uptake for it I believe? I'm not sure how many games currently go to that level of optimization.

    Going off hand I think id software was one of early adopters and the games using it did show a relatively higher gain for cards that had 2xFP16 (Turing/Vega/etc.) over ones that didn't (Pascal). But I believe they also leverage other techniques that aren't present either on the older gens so it's tricky to isolate.

    In terms of Ampere specifically I'd speculate it wouldn't be an in issue. If you really think about it's also a matter of perspective in this case as you could look at it like gaining 2xFP32 as opposed to not having 2xFP16, it's not like FP16 rate has actually gone down against Turing at each "tier." Also in Ampere's case I believe since the Tensor cores can now run simultaneously that might mean concurrent FP16 OPs unlike Turing, so real throughput might be higher than it seems. But in general I'd think Ampere is sitting already relatively high in resources for FP operations over everything else to the point of diminishing returns already.

    If we go with Doom Eternal leaked numbers (which uses FP16 optimizations I believe) it seems like it's sitting on the higher end of gains over Turing anyways.
     
    Lightman, Putas and PSman1700 like this.
  15. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    Does it? I've skimmed through this yesterday but can't say that I remember FP16 being mentioned there at all.
     
  16. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    Reviews popping up...
     
  17. I remember id Software games (Doom and Prey maybe?) and Far Cry 5.

    It's extremely hard for AMD to push any type of new technology into the PC market. nVidia doesn't only have over 80% of the discrete GPU market, their infiltration into dev teams is also nothing AMD has or can do.
     
  18. gamervivek

    Regular

    Joined:
    Sep 13, 2008
    Messages:
    805
    Likes Received:
    320
    Location:
    india
    All hail the new thermi :lol:
     
  19. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    I'd guess 4k margins are bigger because games at 4k are more alu limited.
     
    Lightman likes this.
  20. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...