Speculation: GPU Performance Comparisons of 2020 *Spawn*

Discussion in 'Architecture and Products' started by eastmen, Jul 20, 2020.

Thread Status:
Not open for further replies.
  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,595
    Likes Received:
    3,711
    Location:
    Finland
    Microsoft demoed DirectML-based superresolution solution in 2018 already, so clearly they've been working on theirs for quite some time
     
  2. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,947
    Likes Received:
    1,072
    Location:
    msk.ru/spb.ru
    They've demoed NV solution running through DML on NV h/w. Not sure if this qualifies as MS working on their solution for quite some time.
     
  3. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,346
    Likes Received:
    2,629
    Although i agree with the sentiment that MS has probably been working on it.
    The demo was based on Nvidia models, and it's the models that is the core of any ML
     
  4. troyan

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    297
    Likes Received:
    591
    And it is not the same like DLSS. The Shield TV supports AI upscaling and with the latest update up to 1080p/60fps -> 4k/60FPS.
     
  5. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,053
    Likes Received:
    1,239
    I'm curious about your optimism.
    In the past 1 AMD TF always was more for me than 1 NV TF, but the difference became pretty small with the years. Now i'm no longer up to date with concurrent float / int ALUs on NV.
    I would not wonder if AMD misses the goal to compete at high end. But we'll see, and not sure how justified this new 'high end' is at all for the masses.
     
    PSman1700 likes this.
  6. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,042
    Likes Received:
    441
    Their marketing have joined the shitlord championship too.
    We're back to good old uncharted waters now, prepare your funny FLOP marketed@gaming actual charts because we'll all need them.
    They're here to win.
    High-end thingies are usually made for PR purposes.
    Being the best in a set of metrics is very nice and it gives every other product in your lineup the prestigious halo of dominance.
     
    JoeJ likes this.
  7. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    5,968
    Likes Received:
    6,084
    Location:
    Barcelona Spain
    Because in real game performance the 30 Tflops 3080 is not three times more powerful than 2080 10,07 Tflops but up to two times more performant after this is maybe not important for your work but this is the message of Nvidia.

    At least this is the current state before driver improvement.
     
  8. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,053
    Likes Received:
    1,239
    That's a problem of software, not hardware. Taking prev gen games and looking at 4K 180 fps vs. 90 fps does not tell us so much about real performance because obviously the GPU is just bored in both cases.
    (For me, always interested in compute perf, games are no benchmark anyways.)
     
    DavidGraham, PSman1700 and Jawed like this.
  9. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,906
    Likes Received:
    1,345
    Location:
    France
    And TF is not the only thing impacting perfs...
     
    DavidGraham and PSman1700 like this.
  10. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,947
    Likes Received:
    1,072
    Location:
    msk.ru/spb.ru
    That's because Ampere isn't running INTs in parallel when it runs FP32 at full speed, and these INTs did give Turing some 30% or so performance boost on average. So you get less than twice the performance but you do get double the FP32 - which can be important when FP32 is what you're actually looking for.
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,595
    Likes Received:
    3,711
    Location:
    Finland
    NVIDIAs "TFLOPS" as any sort of measurement unit compared to current HW (including XSX) went out the window with Ampere. Either they can't feed the FP32 units or there's some other major bottlenecks, since 3080 is getting some 60-90% increases in performance with near triple theoretical TFLOPS compared to 2080S
     
  12. troyan

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    297
    Likes Received:
    591
    No, the number is correct. Except Volta/Turing no other GPU runs FP32 and INT32 concurrently. So the TFLOPs number from other GPUs arent "real", too.
     
    PSman1700, xpea, trinibwoy and 2 others like this.
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,266
    Likes Received:
    1,522
    Location:
    London
    Graphics Core Next is tuned to extract maximum utilisation out of the float ALUs and for the scalar ALUs to rarely be a bottleneck - all without any compiler help. Has anyone measured how successfully these goals are achieved?
     
  14. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,947
    Likes Received:
    1,072
    Location:
    msk.ru/spb.ru
    2080S is around 12 tflops on actual boost clocks. Let's say that NV's own estimation of INT SIMD taking about 30% of math on them is correct on your typical gaming code - this would mean that you'll need around 15.5 tflops of pure FP32 without INT running in parallel to reach the same performance on similar hardware. That's about half of 3080 flops and +60+90% of performance from these seems very reasonable without the INTs running in parallel all the time. Your typical current gen game isn't limited only by math either, remains to be seen how many CPU, b/w and other limitations a 30 tflops GPU is hitting in this gen games - a sign of it being limited by something other than FP32 is already out there in the form of TSE results being a lot better than TS ones.
     
    PSman1700 and pharma like this.
  15. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    5,968
    Likes Received:
    6,084
    Location:
    Barcelona Spain
    This is why I said I know for your case compute is more important ;) * but for gaming performance it is not the case at least for the moment. And like other people said there are other bottleneck point than the Tflops.

    *Maybe they need to do a CDNA gaming version for your needs
     
    pharma likes this.
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,147
    Likes Received:
    1,647
    Location:
    New York
    This simple point seems to be escaping many people. It does mean that the 5700xt is quite efficient as it manages to stay in range of the 2070 super without the benefit of the extra INT pipeline.
     
    DegustatoR likes this.
  17. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,053
    Likes Received:
    1,239
    hehe, yeah! Arcturus would be the gaming GPU of my dreams. Bye bye restrictive ROPs and RT cores :p
    But that's just dreams and won't ever happen. Luckily Vega is fast enough for my compute needs even while keeping clocked at 150 MHz with fans off. (still unsure if i can believe this myself - current test scene is small, though)
     
    PSman1700 likes this.
  18. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,042
    Likes Received:
    441
    bro it has scalar units for that purpose (whole two, one per SIMD, for Navi, that is).
    Since GCN1 pretty sure.

    It's more of a testament to nV prowess for getting that much out of relatively simple SMs pre-Volta.
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,266
    Likes Received:
    1,522
    Location:
    London
    GCN introduced the dedicated INT pipeline back in, erm, 2012 and it's still there in Navi.
     
  20. Leoneazzurro5

    Newcomer

    Joined:
    Aug 18, 2020
    Messages:
    217
    Likes Received:
    239
    Judging from the technical description of the architecture, Navi 10 has a very efficient shader execution. If performance did not match the thoretical output, it may be due to something out of the workgroups/CU, in this case the main culprits may be in rasterization/primitive culling, shader scheduling (ACEs/load distribution circuitry), limited datapaths, texture unit capability and ROP capability. If indeed there were one or more such bottlenecks and they managed to solve it in this new iteration of the RDNA architecture, then they may show a big performance jump. There are a lot of "ifs" though.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...