Where does it say that it's using "the same hardware"? Using the same SIMD isn't the same as "using the same h/w". FP64 math still runs on dedicated ALUs, which is also why GCN has variable FP64 rates between it's versions. Also I believe that RDNA has a separate 2-lane FP64 unit inside each WGP? Not sure on the details.Using the same hardware as for FP32 and all other formats.
This is a surprisingly difficult thing to find good data on, but the best I could do is here:Where does it say that it's using "the same hardware"? Using the same SIMD isn't the same as "using the same h/w". FP64 math still runs on dedicated ALUs, which is also why GCN has variable FP64 rates between it's versions. Also I believe that RDNA has a separate 2-lane FP64 unit inside each WGP? Not sure on the details.
So there's no way for a 32-bit ALU to do a 64-bit operation over multiple cycles and using multiple register banks?
I assume in the old VLIW days, the individual ALUs in a SIMD could work together to do it? That was always the impression I got with this.
View attachment 6758
It's the same hardware, has always been.Where does it say that it's using "the same hardware"? Using the same SIMD isn't the same as "using the same h/w". FP64 math still runs on dedicated ALUs, which is also why GCN has variable FP64 rates between it's versions. Also I believe that RDNA has a separate 2-lane FP64 unit inside each WGP? Not sure on the details.
That's exactly what they're doing to my understanding.So there's no way for a 32-bit ALU to do a 64-bit operation over multiple cycles and using multiple register banks?
I assume in the old VLIW days, the individual ALUs in a SIMD could work together to do it? That was always the impression I got with this.
View attachment 6758
For larger 64-bit (or double precision) FP data, adjacent registers are combined to hold a full wavefront of data.
As far as I know it's not and has never been "the same hardware". How do you explain the differences in FP64 rates to FP32 on GCN if "it's the same hardware"?It's the same hardware, has always been.
As far as I know it's not and has never been "the same hardware". How do you explain the differences in FP64 rates to FP32 on GCN if "it's the same hardware"?
They're not removing any FP64 ALUs. The existing ratio is 16:1, they're doubling the FP32 ALUs, and the resulting ratio will be 32:1, i.e. the same number of FP64 ALUs will remain per WGPIf the article that was previously posted is correct, then AMD will be further reducing FP64 performance with RDNA 3. That should lead to some transistor savings correct? That would, of course, then be used for other things.
Regards,
SB
And why would every GPU not have a 2:1 ratio if it were free?As far as I know it's not and has never been "the same hardware". How do you explain the differences in FP64 rates to FP32 on GCN if "it's the same hardware"?
And why would every GPU not have a 2:1 ratio if it were free?
And why would every GPU not have a 2:1 ratio if it were free?
I'm not talking product segmentation though, I'm talking where it's not available in hardware, even for cards ostensibly destined for use in a data centre. Vega 10, Polaris 10, Fiji, Tonga, Tahiti, Pitcairn all ended up in DC cards, none feature half rate FP64 at the hardware level. For that matter, why waste the engineering time designing GCN to vary the rate from 1/2 to 1/16 if it's free? Likewise on the nvidia side, 102 and 104 cards end up in DCs, none feature half rate FP64.Well, even if the capability is there in hardware, there's almost always product segmentation shenanigans going on with FP64.
Product segmentation is already done by driver/firmware restrictions, they don't somehow laser off hundreds of FP64 SIMDs. You used to be able to flash consumer nvidia GPUs to professional ones with some PCB modding, though they've since fixed that 'loophole'As mentioned above, to arbitrarily control it by forcing FP64 operations to go to dedicated units.
Do games (graphics or compute workloads) ever need it? Or it's strictly for non-gaming applications? Do the consoles have dedicated FP64 units?
2:1 is not free. While it requires no additional register bandwidth (as it is identical), one has to put in bigger ALUs (capable of handling more bits).They're not removing any FP64 ALUs. The existing ratio is 16:1, they're doubling the FP32 ALUs, and the resulting ratio will be 32:1, i.e. the same number of FP64 ALUs will remain per WGP
And why would every GPU not have a 2:1 ratio if it were free?
They were using the same hardware in the VLIW and GCN architectures (multi precision ALUs). They only switched over to separate FP64 ALUs with RDNA as far as I understand (that's probably the reason the RDNA whitepaper mentions it).As far as I know it's not and has never been "the same hardware". How do you explain the differences in FP64 rates to FP32 on GCN if "it's the same hardware"?
GCN has variable FP64 rate to FP32 between versions.They were using the same hardware in the VLIW and GCN architectures (multi precision ALUs).
Hence it's not "using the same h/w".And how do you get different FP64 rates for multiprecision ALUs in different GPUs? One puts different ALUs in different chips. It can be that simple.
Yes, AMD build different versions of their multi precision vector ALUs and put them in different GPUs. But if you look at one of them (prior to RDNA) FP32 and FP64 was handled by the same hardware in the sense that the same ALUs were responsible for FP32 and FP64 (no separate FP64 ALUs as in RDNA GPUs).GCN has variable FP64 rate to FP32 between versions.
Hence it's not "using the same h/w".
4:1 only a moderate amount (one needs 27x27 bit multipliers, that's why the VLIW GPUs often had 4:1)
How did it work with VLIW? Was just one of the ALUs in a set fatter and capable of 64-bit?
I thought they somehow combined multiple ones to make it work, like how a RDNA 32-bit ALU can do 2X16.
(a+b) * (c+d) = a*c+a*d+b*c+b*d
Well, Ponte Vecchio gets away gluing 63 chips together (of which 47 are functional and 16 to spread thermal load better), so no too manyThat's a lot of chiplets to glue together.