Value of Hardware Unboxed benchmarking *spawn

Subtlesnake · Jul 29, 2023

On the topic of whether the 4060 Ti 16 GB benefits from the extra memory, these Ratchet & Clank: Rift Apart benchmarks show the 16 GB model is 24% faster than the 8 GB model at 1080p with max RT.

Ratchet & Clank Rift Apart Benchmark Test & Performance Analysis

Ratchet and Clank: Rift Apart finally brings the epic franchise to the PC Platform. You get amazing visuals, with support for ray tracing, Direct Storage and all important upscaler technologies. In our performance review, we're taking a closer look at image quality, VRAM usage, and performance...

www.techpowerup.com

Another benchmark shows a 41% improvement in minimum framerates with the 2080 Ti, which we would expect the 4060 Ti 16 GB to perform similarly to.

Ratchet & Clank: Rift Apart тест GPU/CPU | Action / FPS / TPS | TEST GPU

Мы выполнили тестирование игры Ratchet & Clank: Rift Apart на наивысших параметрах графики, на видеокартах из серий GEFORCE

gamegpu.com

Deleted member 2197 · Aug 3, 2023

techuse · Aug 4, 2023

That GPU has no business costing a penny over $600.

Seanspeed · Aug 4, 2023

techuse said:
That GPU has no business costing a penny over $600.

Given it performs kinda like last gen's flagship, it's been quite normal in the past for such a new gen part to not cost more than $400-500.

We've really lost perspective.

techuse · Aug 5, 2023

Seanspeed said:
Given it performs kinda like last gen's flagship, it's been quite normal in the past for such a new gen part to not cost more than $400-500.

We've really lost perspective.

600$ is the upper limit of what I find at all reasonable. Not the ideal value price. Given that FSR 2 is a truly bad technology and DLSS is so wide spread, 500$ makes it an actual value buy.

Seanspeed · Aug 12, 2023

RDNA3 really is just a disaster.

DegustatoR · Aug 12, 2023

Rather huge difference (again) with CB.de results where GRE is almost equal to 6950XT.

techuse · Aug 12, 2023

It honestly doesn't seem like the architecture is even capable of utilizing dual issue in real world scenarios. Complete dud.

DegustatoR · Aug 12, 2023

techuse said:
It honestly doesn't seem like the architecture is even capable of utilizing dual issue in real world scenarios. Complete dud.

Or maybe your expectations from doubling of some FP32 math throughput are wrong? Ampere doubled it completely, without any catches or need for complex compiler changes and yet it still wasn't showing much improvement over Turing in typical gaming workloads.

The real question is how much this "dual issue" has cost AMD in transistors and power. If these are low then it's a solid way to improve the architecture.

techuse · Aug 12, 2023

DegustatoR said:
Or maybe your expectations from doubling of some FP32 math throughput are wrong? Ampere doubled it completely, without any catches or need for complex compiler changes and yet it still wasn't showing much improvement over Turing in typical gaming workloads.

The real question is how much this "dual issue" has cost AMD in transistors and power. If these are low then it's a solid way to improve the architecture.

Ampere shows strong signs of 2xFP32 providing some benefit. 3080 vs 2080ti for example.

Seanspeed · Aug 13, 2023

DegustatoR said:
Or maybe your expectations from doubling of some FP32 math throughput are wrong? Ampere doubled it completely, without any catches or need for complex compiler changes and yet it still wasn't showing much improvement over Turing in typical gaming workloads.

The real question is how much this "dual issue" has cost AMD in transistors and power. If these are low then it's a solid way to improve the architecture.

RTX2080 has 2944 cores. Without the 2xFP32, the 3070 would likewise be counted as 2944 cores.

Both have basically the exact same stock 1700Mhz clock rating. Both have exactly 448GB/s of bandwidth as well. So as close a comparison as we're gonna get.

Yet the 3070 performs about 25% faster. I'd say that shows some pretty good improvements thanks in large part to 2xFP32. Whereas RDNA3 is making very little use of its similar-idea solution. Even if the transistor/space cost is low, the point is that it's not really adding much to anything.

trinibwoy · Aug 14, 2023

I wouldn’t call RDNA 3’s dual-issue a similar idea to Ampere’s 2xFP32. Both of Ampere’s FP32 pipelines appear to be full independent pipelines with dedicated operand bandwidth for each. RDNA 3 seems to only work with specific combinations of instructions from the same thread where operands are shared.

del42sa · Aug 14, 2023

trinibwoy said:
I wouldn’t call RDNA 3’s dual-issue a similar idea to Ampere’s 2xFP32. Both of Ampere’s FP32 pipelines appear to be full independent pipelines with dedicated operand bandwidth for each. RDNA 3 seems to only work with specific combinations of instructions from the same thread where operands are shared.

"Turing is like two delivery guys where one can deliver blue packages and one can deliver green packages. With Ampere the guy who can deliver green packages is now also capable of delivering blue packages instead because 70% of all packages are blue. RDNA3 dual issue is like one guy who can now carry two packages at once but only if they go to neighbors"

"Turing is one delivery guy who can alternate between delivering blue packages and green packages. Except he behaves as if he has to pick up both at once, and can be restricted by register bandwidth (package size?). With Ampere he can deliver blue packages back to back, but still has the same restriction.

Basically you're only issuing one instruction per cycle per SMSP, but are still subject to register bank conflicts for back to back issuing, as if you were actually dual issuing. It's the worst of both worlds. Fortunately register caches should mitigate most of that. And if register caches aren't enough, Nvidia does have a better register banking scheme than AMD, with two dual ported banks instead of four single ported ones"

https://www.reddit.com/r/hardware/comments/1063uxx

DegustatoR · Aug 14, 2023

Seanspeed said:
Yet the 3070 performs about 25% faster.

Exactly. It's only 25% faster from a full on doubling of FP32 throughput. RDNA3 dual issue approach works in way less situations than that so it should by default be way less than +25%.

Seanspeed · Aug 14, 2023

DegustatoR said:
Exactly. It's only 25% faster from a full on doubling of FP32 throughput. RDNA3 dual issue approach works in way less situations than that so it should by default be way less than +25%.

25% faster is a huge win, though. I dont know why you're downplaying that. It's quite hard to get that sort of performance leap in a new GPU generation without big clock gains or just packing in more SM's/CU's.

And yes, it was expected RDNA3's solution wouldn't be as effective, but it seems to be nearly useless.

Qesa · Aug 14, 2023

Ampere also doubled the L1$ size and L1$/SM bandwidth. The ~25% extra performance gain isn't solely from the additional ALUs

Qesa · Aug 14, 2023

Too late to edit, but...

An RDNA3 SIMD can only read 4 vector registers per clock. FMAs of course need 3, which greatly limits the ability to dual issue outside of pure adds/multiplies. I think there are 3 possible scenarios where it can:

The same register is used for multiple operands. x += a*x seems like a weird thing to be doing though so I doubt this comes up much outside of synthetic scenarios
One of the multiply args must be a scalar. I suspect this is the most common case
The operands are already in the Maxwell-style SIMD operand cache. Probably only useful for dot products/matrix multiplication

This of course must be found at compile time, and is further subject to bank conflicts.

It's fairly clear IMO why it provides marginal benefit in general cases. But throw a matrix multiply at it, and, well RDNA3's lead over RDNA2 in AI benchmarks speaks for itself

Seanspeed · Aug 14, 2023

Qesa said:
Ampere also doubled the L1$ size and L1$/SM bandwidth. The ~25% extra performance gain isn't solely from the additional ALUs

Never said it was. But it clearly played a large part.

Flappy Pannus · Aug 14, 2023

Guess this thread is as good a place as any:

Just a brutal takedown. I've never regarded LTT as other than entertainment (...of sorts?) at best, and find Linus generally annoying so don't watch it much regardless. Even so, I kind of naively assumed they somewhat viewed themselves in at least the 'edutainment' category as well, which makes it so utterly bizarre that they would chose to actually touch all this off by 'calling out' Gamers Nexus's testing methodology.

And man, are some of the fuckups here outrageous - and this is largely recent stuff! Not to at least mention some eye-popping ethical concerns - such as the pretty disgusting Billet Labs debacle. Jesus.

techuse · Aug 14, 2023

I have never watched any LTT stuff so I can't comment on their testing, but both HUB and GN have always delivered trustworthy data and are quick to correct any mistakes.

Value of Hardware Unboxed benchmarking *spawn

Subtlesnake

Ratchet & Clank Rift Apart Benchmark Test & Performance Analysis

Ratchet & Clank: Rift Apart тест GPU/CPU | Action / FPS / TPS | TEST GPU

Deleted member 2197

Guest

techuse

Seanspeed

techuse

Seanspeed

DegustatoR

techuse

DegustatoR

techuse

Seanspeed

trinibwoy

Meh

del42sa

DegustatoR

Seanspeed

Qesa

Qesa

Seanspeed

Flappy Pannus

techuse

Similar threads