Speculation: GPU Performance Comparisons of 2020 *Spawn*

Status
Not open for further replies.
Well, I wouldn't call little over 10% worse perf per watt "shitty"

Power_PCAT.png


from:

https://www.techspot.com/review/2099-geforce-rtx-3080/

250W versus 262W, 2080Ti is ~40% faster in Doom Eternal at the settings used for this comparison. 5700XT is supposed to be a 225W card.

When AMD uses 5700XT as the baseline for "performance per watt comparisons" in the slides for Navi 21, I hope everyone's ready with extra salt. Gamers Nexus has very similar power consumption for 5700XT.
 
Power_PCAT.png


from:

https://www.techspot.com/review/2099-geforce-rtx-3080/
https://www.techspot.com/review/2099-geforce-rtx-3080/
250W versus 262W, 2080Ti is ~40% faster in Doom Eternal at the settings used for this comparison. 5700XT is supposed to be a 225W card.

When AMD uses 5700XT as the baseline for "performance per watt comparisons" in the slides for Navi 21, I hope everyone's ready with extra salt. Gamers Nexus has very similar power consumption for 5700XT.
4K isn't really the resolution for 5700 XT though.

https://www.techpowerup.com/review/asus-geforce-rtx-3090-strix-oc/33.html
performance-per-watt_2560-1440.png



https://www.computerbase.de/2020-01...st/3/#diagramm-performance-pro-watt-2560-1440 newest I could find from computerbase with 5700 XT included, within 5% of 2070S and 2060FE which are around same perf/watt as 2080 Ti
upload_2020-9-30_19-44-53.png
 
@Kaotik If 5700xt is drawing more power at 4k, doesn't that suggest it achieves better utilization at 4k (more transistors lit up)?
Perhaps, but it lacks the bandwidth for 4K (see how it drops in performance and perf/watt relative to Radeon VII for example when you crank the resolution)
I don't see much point comparing cards with resolution which clearly isn't suitable for all the compared cards, heck, even 2080 Ti is a hit'n'miss for 4K
 
I don't really know what to expect.
I mean, it's highly unlikely that AMD would list false information in their drivers this close to release, which would mean it has both GDDR and HBM support, which is so far unheard of (to the point where they did two practically identical chips with just different memory controllers, Navi 10 & 12).
Then there's the 16 L2 cache lines, which suggest the 256-bit GDDR memory controllers (I think only couple Xbox SoCs have taken a different route on this with crossbar?), which we know can't be enough for 80CU RDNA(2) chip (unless they found the holy grail, which I think we can agree is quite unlikely)
So there needs to be another explanation - either it's right in front of us (using both GDDR & HBM) no matter how unconventional it sounds, or it has to be some as of yet unknown solution (512-bit? 384-bit?) + crossbar to deal with the L2 cache lines, or some third explanation I can't come up with.

Why would a 512-bit bus require a crossbar?
 
Why would a 512-bit bus require a crossbar?
I suppose it wouldn't if each cache line would feed two memory controller instead of one, but personally I think 512-bit is less likely than some exotic solution based on all the info leaked so far.

edit: fixed one > two
 
Last edited:
I suppose it wouldn't if each cache line would feed two memory controller instead of one, but personally I think 512-bit is less likely than some exotic solution based on all the info leaked so far.

I think it’s the other way around. Each memory controller would serve 2 L2 partitions. Seems perfectly reasonable.

The 256-bit rumor appears to be based on an assumed ratio of L2 partitions to 64-bit memory controllers. I don’t see why that ratio needs to be the same as Navi.

Do we know for sure that there isn’t a crossbar between L2 and memory controllers in Navi1x? AMD’s slide has infinity fabric sitting between them.

6-BC307-AC-302-D-4906-99-E4-92-AB6-AF6643-F.webp
 
I think it’s the other way around. Each memory controller would serve 2 L2 partitions. Seems perfectly reasonable.

The 256-bit rumor appears to be based on an assumed ratio of L2 partitions to 64-bit memory controllers. I don’t see why that ratio needs to be the same as Navi.

Do we know for sure that there isn’t a crossbar between L2 and memory controllers in Navi1x? AMD’s slide has infinity fabric sitting between them.

6-BC307-AC-302-D-4906-99-E4-92-AB6-AF6643-F.webp
Some confusion there, I'm thinking of memory controllers as 16-bit entities (as actually shown in that very slide) rather than 64-bit.

No, we don't know if it maps directly or not, but I think only couple Xbox SoCs so far have gone any other route, so I would consider it quite unlikely explanation. Of course if there's both HBM and GDDR used crossbar needs to be there regardless.
 
Some confusion there, I'm thinking of memory controllers as 16-bit entities (as actually shown in that very slide) rather than 64-bit.

No, we don't know if it maps directly or not, but I think only couple Xbox SoCs so far have gone any other route, so I would consider it quite unlikely explanation. Of course if there's both HBM and GDDR used crossbar needs to be there regardless.

Yeah it is confusing. In other slides AMD presents each memory controller as a monolithic 64-bit block.
 
AMD says 16x32B/clk for Navi 10 connections between L2 and Memory Controllers through Infinity Fabric.

The HotChips Raven Ridge SOC talk made it fairly apparent that the SDF is a configuration based NoC — you can have many transport layer switches scattered around the SoCs, each of which does up to 5 transfers per clock locally (= 5x5 crossbar). EPYC Rome also adds to this story, in that you can reconfigure the memory controller routing for having >1 NUMA domain in the same IOD — this can’t be done if interleaving & routing settings are all hardwired.

So with these clues, it is fair to guess that there are basically 16 SDF switches linking up 16 pairs of L2 slice and Memory Controller slice/port, each of which is a mini local crossbar. If you assume all switches are connected as one ring (for the multimedia/display hub), each switch would still have one port spared under the stated design max.

With that, 1:2 ratio support (16 L2 + 32 channels) seems a done matter in today’s SDF already. On the other hand, 2:3 (16 L2 + 24 channels) might require upgrades to the routing logic (depending on how flexible the address + config ->destination logic is), but IMO it isn’t unattainable.
 
Last edited:
Everything is not about tflops...
This is the problem when quoting from another thread to avoid offtopic there.
The discussion was regarding game performance vs FLOPS specifically, in this case how just like 3080 is nowehere as fast as FLOPS suggest compared to Turing, so were the Radeons of old, having plenty of FLOPS but havingg hard time utilizing them.
 
How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games

The 3080 only has 50% more bandwidth than the 2080S. Radeon VII with 1TB/s bandwidth and 45% more flops is ~10% faster than the 5700xt with its ~450GB/s.

The reasons for not scaling with flops are likely different. With Ampere you can blame other bottlenecks on the chip. It’s not that obvious with Vega.
 
How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games
If only games made for current gen h/w were solely limited by FP32 math, that would be cool.
There are more than enough examples of Ampere scaling nearly linearly in games when compared to Turing.
And you have to add Turing's ints to Turing's flops for such comparison to be a proper one. So in "Ampere metrics" 2080S is ~16.5 tflops so a 30 tflops Ampere even in theory can't be "twice as fast".
 
Everything is not about tflops...
The quote gives context.

The same way GCN lost to Kepler/Maxwell/Pascal in theoretical-TFLOPs/gaming-performance, Ampere loses to Turing and RDNA1 in the same metric.

It's not an important metric, though it is one that nvidia fans used to repeat ad nauseum. I don't think it's right to claim "Ampere has utilization issues" (most probably that throughput is just not designed to ever be reached), but there are those who used to claim that about GCN and now with Ampere they say the problem is with game engines.
 
so were the Radeons of old, having plenty of FLOPS but havingg hard time utilizing them
They've had a hard time utilizing them because they've had widely known h/w design related issues which prevented them from such utilization in graphics specifically. Ampere don't have these thus this comparison isn't valid. As I've already said.
 
Status
Not open for further replies.
Back
Top