Speculation: GPU Performance Comparisons of 2020 Spawn

Jawed · Sep 30, 2020

Kaotik said:
Well, I wouldn't call little over 10% worse perf per watt "shitty"

from:

https://www.techspot.com/review/2099-geforce-rtx-3080/

250W versus 262W, 2080Ti is ~40% faster in Doom Eternal at the settings used for this comparison. 5700XT is supposed to be a 225W card.

When AMD uses 5700XT as the baseline for "performance per watt comparisons" in the slides for Navi 21, I hope everyone's ready with extra salt. Gamers Nexus has very similar power consumption for 5700XT.

Kaotik · Sep 30, 2020

Jawed said:
from:

https://www.techspot.com/review/2099-geforce-rtx-3080/
https://www.techspot.com/review/2099-geforce-rtx-3080/
250W versus 262W, 2080Ti is ~40% faster in Doom Eternal at the settings used for this comparison. 5700XT is supposed to be a 225W card.

When AMD uses 5700XT as the baseline for "performance per watt comparisons" in the slides for Navi 21, I hope everyone's ready with extra salt. Gamers Nexus has very similar power consumption for 5700XT.

4K isn't really the resolution for 5700 XT though.

https://www.techpowerup.com/review/asus-geforce-rtx-3090-strix-oc/33.html

https://www.computerbase.de/2020-01...st/3/#diagramm-performance-pro-watt-2560-1440 newest I could find from computerbase with 5700 XT included, within 5% of 2070S and 2060FE which are around same perf/watt as 2080 Ti

Scott_Arm · Sep 30, 2020

@Kaotik If 5700xt is drawing more power at 4k, doesn't that suggest it achieves better utilization at 4k (more transistors lit up)?

Kaotik · Sep 30, 2020

Scott_Arm said:
@Kaotik If 5700xt is drawing more power at 4k, doesn't that suggest it achieves better utilization at 4k (more transistors lit up)?

Perhaps, but it lacks the bandwidth for 4K (see how it drops in performance and perf/watt relative to Radeon VII for example when you crank the resolution)
I don't see much point comparing cards with resolution which clearly isn't suitable for all the compared cards, heck, even 2080 Ti is a hit'n'miss for 4K

trinibwoy · Sep 30, 2020

Kaotik said:
I don't really know what to expect.
I mean, it's highly unlikely that AMD would list false information in their drivers this close to release, which would mean it has both GDDR and HBM support, which is so far unheard of (to the point where they did two practically identical chips with just different memory controllers, Navi 10 & 12).
Then there's the 16 L2 cache lines, which suggest the 256-bit GDDR memory controllers (I think only couple Xbox SoCs have taken a different route on this with crossbar?), which we know can't be enough for 80CU RDNA(2) chip (unless they found the holy grail, which I think we can agree is quite unlikely)
So there needs to be another explanation - either it's right in front of us (using both GDDR & HBM) no matter how unconventional it sounds, or it has to be some as of yet unknown solution (512-bit? 384-bit?) + crossbar to deal with the L2 cache lines, or some third explanation I can't come up with.

Why would a 512-bit bus require a crossbar?

Kaotik · Sep 30, 2020

trinibwoy said:
Why would a 512-bit bus require a crossbar?

I suppose it wouldn't if each cache line would feed two memory controller instead of one, but personally I think 512-bit is less likely than some exotic solution based on all the info leaked so far.

edit: fixed one > two

CarstenS · Sep 30, 2020

Kaotik said:
Perhaps, but it lacks the bandwidth for 4K (see how it drops in performance and perf/watt relative to Radeon VII for example when you crank the resolution)

Radeon VII had utilization issues at lower res and was run beyond it's sweet spot on v/f curve (again).

trinibwoy · Sep 30, 2020

Kaotik said:
I suppose it wouldn't if each cache line would feed two memory controller instead of one, but personally I think 512-bit is less likely than some exotic solution based on all the info leaked so far.

I think it’s the other way around. Each memory controller would serve 2 L2 partitions. Seems perfectly reasonable.

The 256-bit rumor appears to be based on an assumed ratio of L2 partitions to 64-bit memory controllers. I don’t see why that ratio needs to be the same as Navi.

Do we know for sure that there isn’t a crossbar between L2 and memory controllers in Navi1x? AMD’s slide has infinity fabric sitting between them.

Kaotik · Sep 30, 2020

trinibwoy said:
I think it’s the other way around. Each memory controller would serve 2 L2 partitions. Seems perfectly reasonable.

The 256-bit rumor appears to be based on an assumed ratio of L2 partitions to 64-bit memory controllers. I don’t see why that ratio needs to be the same as Navi.

Do we know for sure that there isn’t a crossbar between L2 and memory controllers in Navi1x? AMD’s slide has infinity fabric sitting between them.

Some confusion there, I'm thinking of memory controllers as 16-bit entities (as actually shown in that very slide) rather than 64-bit.

No, we don't know if it maps directly or not, but I think only couple Xbox SoCs so far have gone any other route, so I would consider it quite unlikely explanation. Of course if there's both HBM and GDDR used crossbar needs to be there regardless.

trinibwoy · Sep 30, 2020

Kaotik said:
Some confusion there, I'm thinking of memory controllers as 16-bit entities (as actually shown in that very slide) rather than 64-bit.

No, we don't know if it maps directly or not, but I think only couple Xbox SoCs so far have gone any other route, so I would consider it quite unlikely explanation. Of course if there's both HBM and GDDR used crossbar needs to be there regardless.

Yeah it is confusing. In other slides AMD presents each memory controller as a monolithic 64-bit block.

pTmdfx · Sep 30, 2020

AMD says 16x32B/clk for Navi 10 connections between L2 and Memory Controllers through Infinity Fabric.

The HotChips Raven Ridge SOC talk made it fairly apparent that the SDF is a configuration based NoC — you can have many transport layer switches scattered around the SoCs, each of which does up to 5 transfers per clock locally (= 5x5 crossbar). EPYC Rome also adds to this story, in that you can reconfigure the memory controller routing for having >1 NUMA domain in the same IOD — this can’t be done if interleaving & routing settings are all hardwired.

So with these clues, it is fair to guess that there are basically 16 SDF switches linking up 16 pairs of L2 slice and Memory Controller slice/port, each of which is a mini local crossbar. If you assume all switches are connected as one ring (for the multimedia/display hub), each switch would still have one port spared under the stated design max.

With that, 1:2 ratio support (16 L2 + 32 channels) seems a done matter in today’s SDF already. On the other hand, 2:3 (16 L2 + 24 channels) might require upgrades to the routing logic (depending on how flexible the address + config ->destination logic is), but IMO it isn’t unattainable.

Kaotik · Oct 5, 2020

DegustatoR said:
Yeah, in games these Radeons of old had a lot of issues reaching these theoretical flops. None of which are present in Ampere. Not a valid comparison.

How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games

Rootax · Oct 5, 2020

Kaotik said:
How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games

Everything is not about tflops...

Kaotik · Oct 5, 2020

Rootax said:
Everything is not about tflops...

This is the problem when quoting from another thread to avoid offtopic there.
The discussion was regarding game performance vs FLOPS specifically, in this case how just like 3080 is nowehere as fast as FLOPS suggest compared to Turing, so were the Radeons of old, having plenty of FLOPS but havingg hard time utilizing them.

trinibwoy · Oct 5, 2020

Kaotik said:
How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games

The 3080 only has 50% more bandwidth than the 2080S. Radeon VII with 1TB/s bandwidth and 45% more flops is ~10% faster than the 5700xt with its ~450GB/s.

The reasons for not scaling with flops are likely different. With Ampere you can blame other bottlenecks on the chip. It’s not that obvious with Vega.

DegustatoR · Oct 5, 2020

Kaotik said:
How is it not? 3080 with 30 TFLOPS isn't even twice as fast as 2080S with ~11 TFLOPS in games

If only games made for current gen h/w were solely limited by FP32 math, that would be cool.
There are more than enough examples of Ampere scaling nearly linearly in games when compared to Turing.
And you have to add Turing's ints to Turing's flops for such comparison to be a proper one. So in "Ampere metrics" 2080S is ~16.5 tflops so a 30 tflops Ampere even in theory can't be "twice as fast".

Deleted member 13524 · Oct 5, 2020

Rootax said:
Everything is not about tflops...

The quote gives context.

The same way GCN lost to Kepler/Maxwell/Pascal in theoretical-TFLOPs/gaming-performance, Ampere loses to Turing and RDNA1 in the same metric.

It's not an important metric, though it is one that nvidia fans used to repeat ad nauseum. I don't think it's right to claim "Ampere has utilization issues" (most probably that throughput is just not designed to ever be reached), but there are those who used to claim that about GCN and now with Ampere they say the problem is with game engines.

DegustatoR · Oct 5, 2020

Kaotik said:
so were the Radeons of old, having plenty of FLOPS but havingg hard time utilizing them

They've had a hard time utilizing them because they've had widely known h/w design related issues which prevented them from such utilization in graphics specifically. Ampere don't have these thus this comparison isn't valid. As I've already said.

PSman1700 · Oct 5, 2020

AMD most likely will provide a close to 30TF navi2 anyway so it doesnt matter.

troyan · Oct 5, 2020

Rootax said:
Everything is not about tflops...

And games are not pure compute workloads. So even when FP32 workload gets processed in half the time frame rendering will still need more time to finish.

Speculation: GPU Performance Comparisons of 2020 Spawn

Jawed

Kaotik

Drunk Member

Scott_Arm

Kaotik

Drunk Member

trinibwoy

Meh

Kaotik

Drunk Member

CarstenS

Moderator

trinibwoy

Meh

Kaotik

Drunk Member

trinibwoy

Meh

pTmdfx

Kaotik

Drunk Member

Rootax

Kaotik

Drunk Member

trinibwoy

Meh

DegustatoR

Deleted member 13524

Guest

DegustatoR

PSman1700

troyan

Similar threads

Speculation: GPU Performance Comparisons of 2020 *Spawn*

Drunk Member

Drunk Member

Meh

Drunk Member

Moderator

Meh

Drunk Member

Meh

Drunk Member

Drunk Member

Meh

Deleted member 13524

Guest

Similar threads

Speculation: GPU Performance Comparisons of 2020 Spawn