Nvidia Ampere Discussion [2020-05-14]

Voxilla · Sep 16, 2020

From this video, an interesting table from the (unreleased?) white paper:

Man from Atlantis · Sep 16, 2020

So nvidia disabled a complete GPC of GA102 for RTX 3080 and almost doubled L1$/SharedMemory compared to Turing

CarstenS · Sep 16, 2020

P100: 64+64 KiByte L1/SMEM
V100: 128 KiByte L1/SMEM
TU10x: 96 KiByte L1/SMEM
A100: 192 KiByte L1/SMEM
GA10x: 128 KiByte L1/SMEM

edit: Per SM.

trinibwoy · Sep 16, 2020

Voxilla said:
From this video, an interesting table from the (unreleased?) white paper:
View attachment 4615

FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.

DegustatoR · Sep 16, 2020

trinibwoy said:
FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.

AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.

trinibwoy · Sep 16, 2020

DegustatoR said:
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.

The white paper says "non-Tensor" for both FP32 and FP16. Not sure if that has anything to do with which ALUs they run on.

DegustatoR · Sep 16, 2020

trinibwoy said:
The white paper says "non-Tensor" for both FP32 and FP16. Not sure if that has anything to do with which ALUs they run on.

It's non-tensor but AFAIK all FP16 - including non-tensor math - was running on TC hardware on Turing. I dunno how it is now with Ampere.

Kaotik · Sep 16, 2020

Voxilla said:
From this video, an interesting table from the (unreleased?) white paper:

Not unreleased, it's available for us in the media field. Nothing in the whitepaper is NDA'd but you're not allowed to release whole whitepaper as is (guess they'll bring it out later for everyone)

arandomguy · Sep 16, 2020

trinibwoy said:
FP16 rate is the same as FP32. Does that mean each pipeline can no longer do double rate FP16 or one pipeline can and the other doesn't do FP16 at all.

DegustatoR said:
AFAIK FP16 was run on tensor cores on Turing and not the main FP32 SIMD so the change is likely more due to the changes in TCs than in FP32 SIMDs.

If you look at it FP16 TF is the same 1/4 ratio to FP16 Tensor TF with Turing and Ampere.

It also looks like the Tensor Cores in RTX 3080 might be only 1/2 (and 1/4) rate compared to the ones in A100 versus 1:1 (and 1/2) for Turing Gaming vs. Pro/V100.

DegustatoR · Sep 16, 2020

arandomguy said:
If you look at it FP16 TF is the same 1/4 ratio to FP16 Tensor TF with Turing and Ampere.

It also looks like the Tensor Cores in RTX 3080 might be only 1/2 (and 1/4) rate compared to the ones in A100 versus 1:1 (and 1/2) for Turing Gaming vs. Pro/V100.

Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.

I wonder if this will even affect anything in practice really. FP16 RPM was hyped to hell back at Vega and PS4Pro launch and hasn't really manifested itself much in any performance since then.

Deleted member 13524 · Sep 16, 2020

DegustatoR said:
FP16 RPM was hyped to hell back at Vega and PS4Pro launch

FP16 wasn't hyped at all for the PS4 Pro. IIRC the only mention of RPM in the Pro from Sony you'll ever find is Cerny casually mentioning it during a DF interview.

For the Vega it was indeed hyped, though it happened during Raja's reign where RTG marketing was.. different.

DegustatoR · Sep 16, 2020

ToTTenTranz said:
FP16 wasn't hyped at all for the PS4 Pro. IIRC the only mention of RPM in the Pro from Sony you'll ever find is Cerny casually mentioning it during a DF interview.

Yeah and this lead to people everywhere saying that PS4Pro is actually twice the teraflops and such stupid stuff.

Deleted member 13524 · Sep 16, 2020

DegustatoR said:
Yeah and this lead to people everywhere saying that PS4Pro is actually twice the teraflops and such stupid stuff.

It is, though. At maximum FP16 throughput.

arandomguy · Sep 16, 2020

DegustatoR said:
Yeah so it's quite possible that the execution for FP16 vector math hasn't changed and it's still running on TCs but due to gaming Ampere having half of them now it's now of the same speed as FP32.

I wonder if this will even affect anything in practice really. FP16 RPM was hyped to hell back at Vega and PS4Pro launch and hasn't really manifested itself much in any performance since then.

There's a lack of software uptake for it I believe? I'm not sure how many games currently go to that level of optimization.

Going off hand I think id software was one of early adopters and the games using it did show a relatively higher gain for cards that had 2xFP16 (Turing/Vega/etc.) over ones that didn't (Pascal). But I believe they also leverage other techniques that aren't present either on the older gens so it's tricky to isolate.

In terms of Ampere specifically I'd speculate it wouldn't be an in issue. If you really think about it's also a matter of perspective in this case as you could look at it like gaining 2xFP32 as opposed to not having 2xFP16, it's not like FP16 rate has actually gone down against Turing at each "tier." Also in Ampere's case I believe since the Tensor cores can now run simultaneously that might mean concurrent FP16 OPs unlike Turing, so real throughput might be higher than it seems. But in general I'd think Ampere is sitting already relatively high in resources for FP operations over everything else to the point of diminishing returns already.

If we go with Doom Eternal leaked numbers (which uses FP16 optimizations I believe) it seems like it's sitting on the higher end of gains over Turing anyways.

DegustatoR · Sep 16, 2020

arandomguy said:
If we go with Doom Eternal leaked numbers (which uses FP16 optimizations I believe)

Does it? I've skimmed through this yesterday but can't say that I remember FP16 being mentioned there at all.

Voxilla · Sep 16, 2020

Reviews popping up...

Deleted member 13524 · Sep 16, 2020

arandomguy said:
There's a lack of software uptake for it I believe? I'm not sure how many games currently go to that level of optimization.

I remember id Software games (Doom and Prey maybe?) and Far Cry 5.

It's extremely hard for AMD to push any type of new technology into the PC market. nVidia doesn't only have over 80% of the discrete GPU market, their infiltration into dev teams is also nothing AMD has or can do.

gamervivek · Sep 16, 2020

All hail the new thermi

Scott_Arm · Sep 16, 2020

Rootax said:
Seems to me that this gpu really shows that flops isn't everything. I've no proof of course but I believe that the more you need fp32, the more this gpu will shine. In others situations, it will be bottleneck elsewhere (bandwitdh ? Rops? Even drivers ? A mix of everything I guess). In a weird way it reminds me some old ati gpu

I'd guess 4k margins are bigger because games at 4k are more alu limited.

BRiT · Sep 16, 2020

Review Thread: https://forum.beyond3d.com/threads/...n-rtx-gpu-launch-lineup-3070-3080-3090.61978/

Nvidia Ampere Discussion [2020-05-14]

Voxilla

Man from Atlantis

CarstenS

Moderator

trinibwoy

Meh

DegustatoR

trinibwoy

Meh

DegustatoR

Kaotik

Drunk Member

arandomguy

DegustatoR

Deleted member 13524

Guest

DegustatoR

Deleted member 13524

Guest

arandomguy

DegustatoR

Voxilla

Deleted member 13524

Guest

gamervivek

Scott_Arm

BRiT

(>• •)>⌐■-■ (⌐■-■)

Similar threads