Speculation: GPU Performance Comparisons of 2020 *Spawn*

Discussion in 'Architecture and Products' started by eastmen, Jul 20, 2020.

Thread Status:
Not open for further replies.
  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    [​IMG]

    from:

    https://www.techspot.com/review/2099-geforce-rtx-3080/

    250W versus 262W, 2080Ti is ~40% faster in Doom Eternal at the settings used for this comparison. 5700XT is supposed to be a 225W card.

    When AMD uses 5700XT as the baseline for "performance per watt comparisons" in the slides for Navi 21, I hope everyone's ready with extra salt. Gamers Nexus has very similar power consumption for 5700XT.
     
    PSman1700 and T2098 like this.
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    4K isn't really the resolution for 5700 XT though.

    https://www.techpowerup.com/review/asus-geforce-rtx-3090-strix-oc/33.html
    [​IMG]


    https://www.computerbase.de/2020-01...st/3/#diagramm-performance-pro-watt-2560-1440 newest I could find from computerbase with 5700 XT included, within 5% of 2070S and 2060FE which are around same perf/watt as 2080 Ti
    upload_2020-9-30_19-44-53.png
     
  3. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    @Kaotik If 5700xt is drawing more power at 4k, doesn't that suggest it achieves better utilization at 4k (more transistors lit up)?
     
    pharma and PSman1700 like this.
  4. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Perhaps, but it lacks the bandwidth for 4K (see how it drops in performance and perf/watt relative to Radeon VII for example when you crank the resolution)
    I don't see much point comparing cards with resolution which clearly isn't suitable for all the compared cards, heck, even 2080 Ti is a hit'n'miss for 4K
     
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Why would a 512-bit bus require a crossbar?
     
    PSman1700 likes this.
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    I suppose it wouldn't if each cache line would feed two memory controller instead of one, but personally I think 512-bit is less likely than some exotic solution based on all the info leaked so far.

    edit: fixed one > two
     
    #566 Kaotik, Sep 30, 2020
    Last edited: Sep 30, 2020
    Lightman likes this.
  7. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Radeon VII had utilization issues at lower res and was run beyond it's sweet spot on v/f curve (again).
     
    Lightman and PSman1700 like this.
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    I think it’s the other way around. Each memory controller would serve 2 L2 partitions. Seems perfectly reasonable.

    The 256-bit rumor appears to be based on an assumed ratio of L2 partitions to 64-bit memory controllers. I don’t see why that ratio needs to be the same as Navi.

    Do we know for sure that there isn’t a crossbar between L2 and memory controllers in Navi1x? AMD’s slide has infinity fabric sitting between them.

    [​IMG]
     
    Jawed, Lightman and PSman1700 like this.
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Some confusion there, I'm thinking of memory controllers as 16-bit entities (as actually shown in that very slide) rather than 64-bit.

    No, we don't know if it maps directly or not, but I think only couple Xbox SoCs so far have gone any other route, so I would consider it quite unlikely explanation. Of course if there's both HBM and GDDR used crossbar needs to be there regardless.
     
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Yeah it is confusing. In other slides AMD presents each memory controller as a monolithic 64-bit block.
     
    PSman1700 likes this.
  11. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    416
    Likes Received:
    379
    AMD says 16x32B/clk for Navi 10 connections between L2 and Memory Controllers through Infinity Fabric.

    The HotChips Raven Ridge SOC talk made it fairly apparent that the SDF is a configuration based NoC — you can have many transport layer switches scattered around the SoCs, each of which does up to 5 transfers per clock locally (= 5x5 crossbar). EPYC Rome also adds to this story, in that you can reconfigure the memory controller routing for having >1 NUMA domain in the same IOD — this can’t be done if interleaving & routing settings are all hardwired.

    So with these clues, it is fair to guess that there are basically 16 SDF switches linking up 16 pairs of L2 slice and Memory Controller slice/port, each of which is a mini local crossbar. If you assume all switches are connected as one ring (for the multimedia/display hub), each switch would still have one port spared under the stated design max.

    With that, 1:2 ratio support (16 L2 + 32 channels) seems a done matter in today’s SDF already. On the other hand, 2:3 (16 L2 + 24 channels) might require upgrades to the routing logic (depending on how flexible the address + config ->destination logic is), but IMO it isn’t unattainable.
     
    #571 pTmdfx, Sep 30, 2020
    Last edited: Sep 30, 2020
    Jawed and Lightman like this.
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games
     
  13. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,401
    Likes Received:
    1,845
    Location:
    France
    Everything is not about tflops...
     
    PSman1700 likes this.
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    This is the problem when quoting from another thread to avoid offtopic there.
    The discussion was regarding game performance vs FLOPS specifically, in this case how just like 3080 is nowehere as fast as FLOPS suggest compared to Turing, so were the Radeons of old, having plenty of FLOPS but havingg hard time utilizing them.
     
    BRiT likes this.
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    The 3080 only has 50% more bandwidth than the 2080S. Radeon VII with 1TB/s bandwidth and 45% more flops is ~10% faster than the 5700xt with its ~450GB/s.

    The reasons for not scaling with flops are likely different. With Ampere you can blame other bottlenecks on the chip. It’s not that obvious with Vega.
     
    Rootax likes this.
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    If only games made for current gen h/w were solely limited by FP32 math, that would be cool.
    There are more than enough examples of Ampere scaling nearly linearly in games when compared to Turing.
    And you have to add Turing's ints to Turing's flops for such comparison to be a proper one. So in "Ampere metrics" 2080S is ~16.5 tflops so a 30 tflops Ampere even in theory can't be "twice as fast".
     
    PSman1700 likes this.
  17. The quote gives context.

    The same way GCN lost to Kepler/Maxwell/Pascal in theoretical-TFLOPs/gaming-performance, Ampere loses to Turing and RDNA1 in the same metric.

    It's not an important metric, though it is one that nvidia fans used to repeat ad nauseum. I don't think it's right to claim "Ampere has utilization issues" (most probably that throughput is just not designed to ever be reached), but there are those who used to claim that about GCN and now with Ampere they say the problem is with game engines.
     
    Kej, Lightman, Erinyes and 3 others like this.
  18. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    They've had a hard time utilizing them because they've had widely known h/w design related issues which prevented them from such utilization in graphics specifically. Ampere don't have these thus this comparison isn't valid. As I've already said.
     
    PSman1700 likes this.
  19. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,092
    AMD most likely will provide a close to 30TF navi2 anyway so it doesnt matter.
     
  20. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    605
    Likes Received:
    1,126
    And games are not pure compute workloads. So even when FP32 workload gets processed in half the time frame rendering will still need more time to finish.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...