Bondrewd
Veteran
Co-rrect!Separately there's noise in Twitterverse that Lovelace will have much higher clocks, similar to RDNA 2/3
Clks and physdes in general is a big focus for nextgen nV parts.
Co-rrect!Separately there's noise in Twitterverse that Lovelace will have much higher clocks, similar to RDNA 2/3
Where I did mention L2 cache. I said that in a 3D stack you can easily add not ony the connection from the compute die to the cache die but also synchronization signals that can propagate through the entire stack, because the 3D stack is basically equivalent to a very big IC with some latency penalties in the interconnection points. I wonder why I even try to explain trivial concepts to someone that thinks a +30% gain in terms of perf/W on the same process node is bad. It's clear that either 1) you are not understanding what a 3D stack is, thus not understanding that every single signal can be passed though the TSVs and not only memory inteface ones or 2) You are trying to spread FUD. As it's quite evident by your post hystory that the second is quite the most probable one, let me put in ignore so I avoid my eyes getting hurt by your FUD spreading.
Please do not selectively quote partial sentences to fit whatever you think it is.H100 has no display outs or RT gear.
Impossible.
Can't do that.I explicitly said "If the constellation is such, that only with a heavily overclocked Hopper they can claim perf kingship in the desktop and/or gaming space, …"
Use your eyes better then.Just saw this now.
NVIDIA Lovelace vs AMD RDNA 3, what has not been told about their GPUs (techunwrapped.com)
But, the article speculates that Lovelace has been confused for Hopper. And then seemingly goes wild: "Hence, we think that the configuration of 144 Shader Units could correspond to Hopper and not Lovelace, since it is said that Lovelace will be a multi-chip GPU."
Separately there's noise in Twitterverse that Lovelace will have much higher clocks, similar to RDNA 2/3. That's one way of using less SMs, which is the central bone of contention that the article focuses on.
They doubled FP32 units, not compute power. Thus teraflops becomes an even worse unit to compare different architectures. Same for benchmarks, depending on how much math they do.nVidia doubled compute throughput with Ampere over Turing
I don't think AMD has moved into the opposite direction in terms of async compute support working less well than on GCN?That's not a solution. AMD has moved in the exactly opposite direction with RDNA for a reason - it had failed with GCN at finding async workloads on PC. Straightforward ports from consoles simply didn't have enough of work to fill PC's GPUs.
As far as I remember, common consensus from reviews was that this doubling of thru... didn't work and didn't scale well. Low throughout Navi21 was on par with 3090 (of course excluding raytraycing). Probably in special scenarios ampere rocks but in regular games is on par.nVidia doubled compute throughput with Ampere over Turing and didnt scale every (fixed) function with it. It was a genius move.
No shit.As far as I remember, common consensus from reviews was that this doubling of thru...
If you're talking about the register file, IIRC it has been providing operands for two instructions per clock since the introduction of GV100.No shit.
Twice the FMA per the same amount of r/w ports is what client amperage is.
Ampere didn't double the ports and reg file because Turing already had these sufficient for running INT32 in parallel.No shit.
Twice the FMA per the same amount of r/w ports is what client amperage is.
Which is why I've said double the FMA per the same amount of r/w ports.Ampere didn't double the ports and reg file because Turing already had these sufficient for running INT32 in parallel.
Needs that ILP juice too.Ampere hits its FP32 peaks fine when the code is pure FP32
Yea.then it will scale just as well as RDNA2 in comparison to RDNA1 did.
Not really. It's not like Ampere is VLIW2.Needs that ILP juice too.
They doubled FP32 units, not compute power. Thus teraflops becomes an even worse unit to compare different architectures. Same for benchmarks, depending on how much math they do.
If you're talking about the register file, IIRC it has been providing operands for two instructions per clock since the introduction of GV100.
Needs that ILP juice too.
I am not 100% sure but if the 2 instructions per clock come from different warps what you’re asking for probably doesn’t exist.Is there a tool that lists out the exact machine code that is scheduled on Ampere? Like the various AMD tools that list out the ISA code?