Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,121
    Likes Received:
    3,093
    As nv said, biggest leap ever. On all fronts.
     
    pharma and Cyan like this.
  2. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    that's a nice discovery, 'cos right now MSi Afterburner has auto OC, and it is working well for me. MSI Afterburner seems to use some kind of undervolt. But if it adapts to your computer's PSU -dunno how-, that's another reason to leave MSi Afterburner -maybe-.
     
    pharma and PSman1700 like this.
  3. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    to sum some things up.

    - CUDA Cores: int32 and fp32 operations are indifferently performed by a CUDA core. That's HUGE!! as shown by the graph below --no specific cuda cores for int32 and fp32 operations. Means: no idle int32 or fp32 cores because cuda cores either perform floating point or integer operations. (games perform around 20-30% operations in int32 format)

    - Where you will notice the generational leap is in the fact that new GPUs perform muh better raytracing denoising.

    - IMO, where series 3000 destroys previous generations is in RT games, which is the way to go.

    - The 3070 performs as the 2080Ti despite having 20 teraflops vs 13 teraflops. This shows how different they are now -this imho means again, that where they truly shine is at RT games.

    [​IMG]
     
  4. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,680
    Yah, one way people undervolt is to let oc scanner in afterburner run and generate a curve. Then they know what frequency is stable at each voltage. So you kind flatten out from whatever frequency you have as your target. I would not be surprised if you can push things a little bit more manually, but it's a good starting point. You should be able to do the same with the power limit adjusted up or down. This nvidia tool looks like it can do the same thing, but if it requires geforce experience I'm not sure that I want it.
     
    Pete and BRiT like this.
  5. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,429
    Likes Received:
    910
    Not true. Supported modes are only 128+0 or 64+64.
     
  6. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Not sure about that. Isn't it really per SM partition? It should be at least. 32+0 or 16+16. I'm splitting hairs like this, because that's leaving the pure FP32 block idle only when you have a 1:2 ratio of FP vs. INT (and underutilized at anything less than 1:1).

    In other news, and because it was mentioned earlier: This makes paper-TFlops IMO actually more comparable between Nvidia and AMD('s current gen), since now INT-ops go against the TFlops budget on both sides, where as they were kind of free on Nvidia with Turing, making it overperform wrt to its TFlops rating.
     
    Lightman, Pete, iroboto and 1 other person like this.
  7. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    This picture is inaccurate. The GTX 1080 should be twice as fast as shown. It’s execution units are 32-wide vs the 16-wide units in Ampere and Turing.
     
    Lightman likes this.
  8. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    25
    Likes Received:
    3
    I do it like this :

    Main problem with TDP slider is inefficiency (since it doesn't throttle voltage as much as it can [for GPU to be stable], performance drop is also higher than it should be under maximum load).
     
    BRiT likes this.
  9. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    25
    Likes Received:
    3
    Shouldn't that split be like this :
    128:0 = 4x16 [FP32 fixed] + 4x16 FP32 ["mixed" units]
    64:64 = 4x16 [FP32 fixed] + 4x16 INT32 ["mixed" units]
    ?
     
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    No, instruction dispatch is handled independently each clock in each of the 4 SM partitions. Each partition has 16 FP32 fixed + 16 FP32/INT32 mixed pipelines. The mix of instructions each clock can be different between partitions in the same SM.
     
    BRiT and pjbliverpool like this.
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Maybe this is normalized to the same number of ALUs?
     
    pjbliverpool likes this.
  12. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Each SM-Partition (i.e. block consisting of 256kiB RF, 16xINT32, 16xFP32, 4xL/S, 4xSFU) has its own scheduler and dispatch, so they should be able to operate independently.
     
  13. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,429
    Likes Received:
    910
    I wasn't sure how granular the split could be but Computerbase.de states its either 128FP or 64FP + 64INT. I figured it was a scheduling limitation of some type and would help explain the performance scaling deficit. With finer grained scheduling and higher utilization i would have expected the dramatic increase in core counts to result in a bigger performance increase even with other possible bottlenecks.
     
    #1313 techuse, Sep 7, 2020
    Last edited: Sep 7, 2020
  14. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,248
    Likes Received:
    3,417
    Bandwidth, CPU, etc are still there. The more shading limited Ampere is - the closer it gets to its peak flops.
     
    PSman1700 likes this.
  15. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,429
    Likes Received:
    910
    What is the level of granularity for FP and INT scheduling per SM?
     
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,248
    Likes Received:
    3,417
    As others said: either 16+16 FP32 or 16+16 FP32+INT.
     
    techuse and PSman1700 like this.
  17. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Not sure why Nvidia would invest in more control logic (scheduler, dispatch, compared to Pascal) if not for more fine grained control. Haven't been in the briefings, so this is just me making assumptions, though.
     
  18. marifire

    Newcomer

    Joined:
    May 13, 2007
    Messages:
    46
    Likes Received:
    41
    BTW I think TFlops was a somewhat fair indicative between turing and navi 10 gaming performance, but ampere is definitely another kind of beast. Nvdia said 14.2TFlop for 2080 Ti FE at 1635Mhz boost, and AMD 9.7 for 5700 XT at 1905Mhz, but 2080 Ti FE real gaming median clock is 1830Mhz and 1890Mhz for 5700 XT, so should be around 16TFlop and 9.5TFlop...in latest TPU performance chart, 2080 Ti is 50% faster at gaming 4K, so looks like neck and neck or even AMDs TFlops are already a bit ahead there.
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Which tests demonstrate this?
     
  20. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    808
    Likes Received:
    276
    Are the die sizes and transistor counts from Nvidia directly? I haven't seen the numbers mentioned anywhere else. And the RTX 3070 uses 16 Gbps GDDR6 dosen't it?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...