From this video, an interesting table from the (unreleased?) white paper:
View attachment 4615
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.
It's non-tensor but AFAIK all FP16 - including non-tensor math - was running on TC hardware on Turing. I dunno how it is now with Ampere.The white paper says "non-Tensor" for both FP32 and FP16. Not sure if that has anything to do with which ALUs they run on.
Not unreleased, it's available for us in the media field. Nothing in the whitepaper is NDA'd but you're not allowed to release whole whitepaper as is (guess they'll bring it out later for everyone)From this video, an interesting table from the (unreleased?) white paper:
FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.
Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.If you look at it FP16 TF is the same 1/4 ratio to FP16 Tensor TF with Turing and Ampere.
It also looks like the Tensor Cores in RTX 3080 might be only 1/2 (and 1/4) rate compared to the ones in A100 versus 1:1 (and 1/2) for Turing Gaming vs. Pro/V100.
FP16 RPM was hyped to hell back at Vega and PS4Pro launch
Yeah and this lead to people everywhere saying that PS4Pro is actually twice the teraflops and such stupid stuff.FP16 wasn't hyped at all for the PS4 Pro. IIRC the only mention of RPM in the Pro from Sony you'll ever find is Cerny casually mentioning it during a DF interview.
It is, though. At maximum FP16 throughput.Yeah and this lead to people everywhere saying that PS4Pro is actually twice the teraflops and such stupid stuff.
Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.
I wonder if this will even affect anything in practice really. FP16 RPM was hyped to hell back at Vega and PS4Pro launch and hasn't really manifested itself much in any performance since then.
Does it? I've skimmed through this yesterday but can't say that I remember FP16 being mentioned there at all.If we go with Doom Eternal leaked numbers (which uses FP16 optimizations I believe)
I remember id Software games (Doom and Prey maybe?) and Far Cry 5.There's a lack of software uptake for it I believe? I'm not sure how many games currently go to that level of optimization.
Seems to me that this gpu really shows that flops isn't everything. I've no proof of course but I believe that the more you need fp32, the more this gpu will shine. In others situations, it will be bottleneck elsewhere (bandwitdh ? Rops? Even drivers ? A mix of everything I guess). In a weird way it reminds me some old ati gpu