Nvidia Ampere Discussion [2020-05-14]

So nvidia disabled a complete GPC of GA102 for RTX 3080 and almost doubled L1$/SharedMemory compared to Turing

l11pjuu.png
 
Last edited:
FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.
 
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.

The white paper says "non-Tensor" for both FP32 and FP16. Not sure if that has anything to do with which ALUs they run on.
 
From this video, an interesting table from the (unreleased?) white paper:
Not unreleased, it's available for us in the media field. Nothing in the whitepaper is NDA'd but you're not allowed to release whole whitepaper as is (guess they'll bring it out later for everyone)
 
FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.

AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.

If you look at it FP16 TF is the same 1/4 ratio to FP16 Tensor TF with Turing and Ampere.

It also looks like the Tensor Cores in RTX 3080 might be only 1/2 (and 1/4) rate compared to the ones in A100 versus 1:1 (and 1/2) for Turing Gaming vs. Pro/V100.
 
If you look at it FP16 TF is the same 1/4 ratio to FP16 Tensor TF with Turing and Ampere.

It also looks like the Tensor Cores in RTX 3080 might be only 1/2 (and 1/4) rate compared to the ones in A100 versus 1:1 (and 1/2) for Turing Gaming vs. Pro/V100.
Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.

I wonder if this will even affect anything in practice really. FP16 RPM was hyped to hell back at Vega and PS4Pro launch and hasn't really manifested itself much in any performance since then.
 
FP16 RPM was hyped to hell back at Vega and PS4Pro launch

FP16 wasn't hyped at all for the PS4 Pro. IIRC the only mention of RPM in the Pro from Sony you'll ever find is Cerny casually mentioning it during a DF interview.

For the Vega it was indeed hyped, though it happened during Raja's reign where RTG marketing was.. different.
 
Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.

I wonder if this will even affect anything in practice really. FP16 RPM was hyped to hell back at Vega and PS4Pro launch and hasn't really manifested itself much in any performance since then.

There's a lack of software uptake for it I believe? I'm not sure how many games currently go to that level of optimization.

Going off hand I think id software was one of early adopters and the games using it did show a relatively higher gain for cards that had 2xFP16 (Turing/Vega/etc.) over ones that didn't (Pascal). But I believe they also leverage other techniques that aren't present either on the older gens so it's tricky to isolate.

In terms of Ampere specifically I'd speculate it wouldn't be an in issue. If you really think about it's also a matter of perspective in this case as you could look at it like gaining 2xFP32 as opposed to not having 2xFP16, it's not like FP16 rate has actually gone down against Turing at each "tier." Also in Ampere's case I believe since the Tensor cores can now run simultaneously that might mean concurrent FP16 OPs unlike Turing, so real throughput might be higher than it seems. But in general I'd think Ampere is sitting already relatively high in resources for FP operations over everything else to the point of diminishing returns already.

If we go with Doom Eternal leaked numbers (which uses FP16 optimizations I believe) it seems like it's sitting on the higher end of gains over Turing anyways.
 
There's a lack of software uptake for it I believe? I'm not sure how many games currently go to that level of optimization.
I remember id Software games (Doom and Prey maybe?) and Far Cry 5.

It's extremely hard for AMD to push any type of new technology into the PC market. nVidia doesn't only have over 80% of the discrete GPU market, their infiltration into dev teams is also nothing AMD has or can do.
 
Seems to me that this gpu really shows that flops isn't everything. I've no proof of course but I believe that the more you need fp32, the more this gpu will shine. In others situations, it will be bottleneck elsewhere (bandwitdh ? Rops? Even drivers ? A mix of everything I guess). In a weird way it reminds me some old ati gpu :D

I'd guess 4k margins are bigger because games at 4k are more alu limited.
 
Back
Top