DavidGraham
Veteran
I feel this means AMD is skimping on RT hardware yet again.
Eh, I mean this looks pretty much like a copy of what Nv did in Ampere but without the loss of a dedicated integer pipeline.The problem remains actual compute throughput in this design.
Nvidia didn't add additional ALU's for Ampere, they just repurposed existing ones.Eh, I mean this looks pretty much like a copy of what Nv did in Ampere but without the loss of a dedicated integer pipeline.
They did. You can't "repurpose" integer ALUs for FP math.Nvidia didn't add additional ALU's for Ampere, they just repurposed existing ones.
The difference would be in the absence of integer SIMD which would mean that a) actual performance gain in mixed math will likely be higher (since there won't be cases where the previously present integer pipeline will do the same stuff again leading to no performance change) and b) area spent will probably be higher - but this is mostly irrelevant if the SIMDs are redesigned anyway.Perhaps that's what AMD is also doing for RDNA3, but what it sounds like so far is an actual physical doubling of the ALU's, which would be a very different situation.
Relative counts of TMUs and ROPs (per CU) don't need to change if you're getting >2x scaling in ALU count per mm² by cutting out a load of scheduling hardware at the CU and SIMD level while simultaneously re-wiring the vector register file and pipeline forwarding to reduce the over-provisioning seen in prior RDNA. So compute density gets a massive boost and power consumption per FLOP should also fall substantially.
The problem remains actual compute throughput in this design.
Has been the case since R600.Both architectures are converging.
I doubt texturing, per se, will be much of a bottleneck - it's not why 6950XT is slower than 3090Ti (744 versus 625 gigatexels/s). There are format-related performance variations there, though and games are now so complex it's hard to compare across architectures...The compute limitation of RDNA2 will shift more to a TMU
There is never a single bottleneck, it's always a mixture, but the bottleneck will shift with doubling of ALUs. A 3090Ti has much more Gflops, therefore it should be a lot faster than a 6950XT, but it's not, because other bottlenecks are limiting it. The same will happen, when AMD doubles its ALUs.I doubt texturing, per se, will be much of a bottleneck - it's not why 6950XT is slower than 3090Ti (744 versus 625 gigatexels/s). There are format-related performance variations there, though and games are now so complex it's hard to compare across architectures...
Not related to your post, just easier to reply I apologize. but what’s your thoughts on the rdna chiplets here wrt the consoles. Eventually in the future there will be another generation; any ideas on what that console architecture may look like ?Yeap.
And their CPU efficiency regressed.
Whatever, Apple sucks now.
Gonna bet on some kind of 2.5D chiplet solution in consoles for the next gen, maybe even for this midgen refresh.any ideas on what that console architecture may look like
It bugs the hell out of me that the sets of available co-issuable OPs per VOPD-half are not even symmetric.I like this mass confusion prior to launch, hehe.
Let’s muddle the water with — according to LLVM patches so far — Wave32 VOPD co-issue supporting only a tiny subset (10-ish) of (mostly FP32) ALU opcodes?
There is never a single bottleneck, it's always a mixture, but the bottleneck will shift with doubling of ALUs. A 3090Ti has much more Gflops, therefore it should be a lot faster than a 6950XT, but it's not, because other bottlenecks are limiting it. The same will happen, when AMD doubles its ALUs.
post processing stage: would a high TF card have any more trouble doing a lot of post processing at lower resolution vs doing less post processing at higher resolution ?Flops aren’t the determining factor in predicting which card should be faster. If anything high flops help with the rare ALU bound pass during a frame. Post processing shaders eat them up.
The vast majority of passes though are memory bandwidth/latency limited on a 3090. I’m guessing it’s mostly latency and that’s why Ampere does a bit better at higher resolutions where there is more work available to hide that latency.
post processing stage: would a high TF card have any more trouble doing a lot of post processing at lower resolution vs doing less post processing at higher resolution ?
I recall reading a quick dev take that said that post processing is really the step that starts making the image look good.
RDNA2 is at about 15:1 FP64 rate which isn't exactly an FP64 monster. How much can they "save" going down from that? I think it's more likely that the ratio of FP32 to FP64 ALUs with go up simply because they'll double the former while leaving the same number per WGP for the latter.If the article that was previously posted is correct, then AMD will be further reducing FP64 performance with RDNA 3. That should lead to some transistor savings correct? That would, of course, then be used for other things.
Regards,
SB
There are no FP64 ALUs in GCN / RDNA.RDNA2 is at about 15:1 FP64 rate which isn't exactly an FP64 monster. How much can they "save" going down from that? I think it's more likely that the ratio of FP32 to FP64 ALUs with go up simply because they'll double the former while leaving the same number per WGP for the latter.
And how do they do FP64 math?There are no FP64 ALUs in GCN / RDNA.