Value of Hardware Unboxed benchmarking *spawn

On the topic of whether the 4060 Ti 16 GB benefits from the extra memory, these Ratchet & Clank: Rift Apart benchmarks show the 16 GB model is 24% faster than the 8 GB model at 1080p with max RT.


Another benchmark shows a 41% improvement in minimum framerates with the 2080 Ti, which we would expect the 4060 Ti 16 GB to perform similarly to.

 
Given it performs kinda like last gen's flagship, it's been quite normal in the past for such a new gen part to not cost more than $400-500.

We've really lost perspective.
600$ is the upper limit of what I find at all reasonable. Not the ideal value price. Given that FSR 2 is a truly bad technology and DLSS is so wide spread, 500$ makes it an actual value buy.
 
It honestly doesn't seem like the architecture is even capable of utilizing dual issue in real world scenarios. Complete dud.
 
It honestly doesn't seem like the architecture is even capable of utilizing dual issue in real world scenarios. Complete dud.
Or maybe your expectations from doubling of some FP32 math throughput are wrong? Ampere doubled it completely, without any catches or need for complex compiler changes and yet it still wasn't showing much improvement over Turing in typical gaming workloads.

The real question is how much this "dual issue" has cost AMD in transistors and power. If these are low then it's a solid way to improve the architecture.
 
Or maybe your expectations from doubling of some FP32 math throughput are wrong? Ampere doubled it completely, without any catches or need for complex compiler changes and yet it still wasn't showing much improvement over Turing in typical gaming workloads.

The real question is how much this "dual issue" has cost AMD in transistors and power. If these are low then it's a solid way to improve the architecture.
Ampere shows strong signs of 2xFP32 providing some benefit. 3080 vs 2080ti for example.
 
Or maybe your expectations from doubling of some FP32 math throughput are wrong? Ampere doubled it completely, without any catches or need for complex compiler changes and yet it still wasn't showing much improvement over Turing in typical gaming workloads.

The real question is how much this "dual issue" has cost AMD in transistors and power. If these are low then it's a solid way to improve the architecture.
RTX2080 has 2944 cores. Without the 2xFP32, the 3070 would likewise be counted as 2944 cores.

Both have basically the exact same stock 1700Mhz clock rating. Both have exactly 448GB/s of bandwidth as well. So as close a comparison as we're gonna get.

Yet the 3070 performs about 25% faster. I'd say that shows some pretty good improvements thanks in large part to 2xFP32. Whereas RDNA3 is making very little use of its similar-idea solution. Even if the transistor/space cost is low, the point is that it's not really adding much to anything.
 
I wouldn’t call RDNA 3’s dual-issue a similar idea to Ampere’s 2xFP32. Both of Ampere’s FP32 pipelines appear to be full independent pipelines with dedicated operand bandwidth for each. RDNA 3 seems to only work with specific combinations of instructions from the same thread where operands are shared.
 
I wouldn’t call RDNA 3’s dual-issue a similar idea to Ampere’s 2xFP32. Both of Ampere’s FP32 pipelines appear to be full independent pipelines with dedicated operand bandwidth for each. RDNA 3 seems to only work with specific combinations of instructions from the same thread where operands are shared.
"Turing is like two delivery guys where one can deliver blue packages and one can deliver green packages. With Ampere the guy who can deliver green packages is now also capable of delivering blue packages instead because 70% of all packages are blue. RDNA3 dual issue is like one guy who can now carry two packages at once but only if they go to neighbors"

"Turing is one delivery guy who can alternate between delivering blue packages and green packages. Except he behaves as if he has to pick up both at once, and can be restricted by register bandwidth (package size?). With Ampere he can deliver blue packages back to back, but still has the same restriction.


Basically you're only issuing one instruction per cycle per SMSP, but are still subject to register bank conflicts for back to back issuing, as if you were actually dual issuing. It's the worst of both worlds. Fortunately register caches should mitigate most of that. And if register caches aren't enough, Nvidia does have a better register banking scheme than AMD, with two dual ported banks instead of four single ported ones"

 
Exactly. It's only 25% faster from a full on doubling of FP32 throughput. RDNA3 dual issue approach works in way less situations than that so it should by default be way less than +25%.
25% faster is a huge win, though. I dont know why you're downplaying that. It's quite hard to get that sort of performance leap in a new GPU generation without big clock gains or just packing in more SM's/CU's.

And yes, it was expected RDNA3's solution wouldn't be as effective, but it seems to be nearly useless.
 
Too late to edit, but...

An RDNA3 SIMD can only read 4 vector registers per clock. FMAs of course need 3, which greatly limits the ability to dual issue outside of pure adds/multiplies. I think there are 3 possible scenarios where it can:
  1. The same register is used for multiple operands. x += a*x seems like a weird thing to be doing though so I doubt this comes up much outside of synthetic scenarios
  2. One of the multiply args must be a scalar. I suspect this is the most common case
  3. The operands are already in the Maxwell-style SIMD operand cache. Probably only useful for dot products/matrix multiplication
This of course must be found at compile time, and is further subject to bank conflicts.

It's fairly clear IMO why it provides marginal benefit in general cases. But throw a matrix multiply at it, and, well RDNA3's lead over RDNA2 in AI benchmarks speaks for itself
 
Guess this thread is as good a place as any:


Just a brutal takedown. I've never regarded LTT as other than entertainment (...of sorts?) at best, and find Linus generally annoying so don't watch it much regardless. Even so, I kind of naively assumed they somewhat viewed themselves in at least the 'edutainment' category as well, which makes it so utterly bizarre that they would chose to actually touch all this off by 'calling out' Gamers Nexus's testing methodology.

And man, are some of the fuckups here outrageous - and this is largely recent stuff! Not to at least mention some eye-popping ethical concerns - such as the pretty disgusting Billet Labs debacle. Jesus.
 
Last edited:
I have never watched any LTT stuff so I can't comment on their testing, but both HUB and GN have always delivered trustworthy data and are quick to correct any mistakes.
 
Back
Top