Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    OpenCL on NV h/w has never been very good so still - we need more data.
     
  2. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    Aren't now AMD ALUs more complete than Nvidias technically?. All can make int ops, meanwhile now only half of them in Ampere.
     
  3. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    I think the problem on turing was a lot of the int32 alu sat idle. I believe the CUs in AMD would be the same where they have dedicated fp and int units, no?
     
  4. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    I think we kind of expected this type of utilization though, because they doubled fp32 alu without doubling anything else. I would not expect this to improve to 90% or anything like that.
     
    Lightman and CarstenS like this.
  5. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    what do you mean? Sorry, I don't understand.... Do you mean that if a cuda core is going to perform both int32 and fp32 ops at the same time, the resources of that cuda core are halved? Or that it isn't as efficient?

    afaik, native integer cores appeared for the first time ever in Turing, that's why only Turing graphics card and the newer ones allow for the famous Integer Scaling.

    [​IMG]

     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Yes, same as Pascal. I think the prevailing assumption is that those combined FP and INT pipelines share transistors to some extent so one pipeline isn’t completely idle. No idea if that’s true though.

    Nvidia clearly thought it made sense to split the FP and INT functionality into separate pipelines for whatever reason.
     
    BRiT likes this.
  7. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    FP32+FP32+INT32 is physically bigger than FP32+[FP32|INT32], it needs another real datapath inside the SM and the three units could not be scheduled at the same time with the current scheduler. Remember, the FP32-INT32 SIMD is not physically separated but combined in one SIMD.
     
    Lightman, Scott_Arm and BRiT like this.
  8. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,426
    Likes Received:
    909
    I was wrong. It can be any number of INT instructions in multiples of 16 if i understood the replies correctly. So 112+16, 80+48 etc. The Computerbase article had me thinking it was either 128FP or 64FP + 64INT with no other combinations.
     
    Cyan likes this.
  9. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    It helps alot in ray tracing, even in software ray tracing, this is the reason GTX Turing performs better than Pascal in ray tracing despite having fewer resources, and this is also the reason Turing surpasses both RDNA 1 and Pascal in software ray tracing (such as CryEngine and World of Tanks).
     
    PSman1700, pharma and LeStoffer like this.
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    This might be cost optimisation: the fixed overheads of an SM partition such as the instruction decoder can be scaled up for relatively little cost while getting a large increase in theoretical throughput. Presumably the fixed overheads grew due to the addition of tensor cores, so then the step to make these two instructions co-issue in Turing seems low cost (well, then there's the compiler). The addition of tensor cores forced them to add overhead and so normal shader instruction processing saw a big change?
     
    #1350 Jawed, Sep 8, 2020
    Last edited: Sep 8, 2020
  11. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
    https://www.techpowerup.com/271907/corsair-working-on-direct-12-pin-nvidia-ampere-power-cable
     
    Lightman and PSman1700 like this.
  12. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    That's a reasonable take. End result seems quite unbalanced though.
     
  13. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Quick question: What would happen, if there were more schedulers for feeding more of [FP32|FP32+INT32|TC|L/S|SFU] per clock (i.e. 3 schedulers/dispatch) with the occasional [RT|TMU] thrown in the mix?
    Right, Powaaahhhh! ;)
     
  14. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Fermi will happen. :p
     
    Kej, Lightman, PSman1700 and 3 others like this.
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    There probably wouldn't be any real benefit. The TMU, SFU + L/S pipelines execute over many clocks.
     
  16. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I don't understand what's so unbalanced then?
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Flops vs everything else (bandwidth, geometry, RT)
     
  18. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Kepler: hold my underfed 192 FMA lanes...
     
  19. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
    Here we go again ...

    Ethereum Miners Eye NVIDIA’s RTX 30 Series GPU as RTX 3080 Offers 3-4x Better Performance in Eth

    https://www.hardwaretimes.com/ether...x-3080-offers-3-4x-better-performance-in-eth/
     
    Silent_Buddha, Lightman, Cyan and 5 others like this.
  20. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
    Soon you'll look at that $1500 price tag as being a bargain compared to the $2400 or higher... Le Sigh.
     
    eloyc likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...