Nvidia Ampere Discussion [2020-05-14]

@trinibwoy I wonder if Nvidia is pusing for 4k and now 8k because it's the easiest way to increase utilization. It does seem weird to essentially double ampere when utilization is so low, especially at common resolutions like 1080p and 1440p.

They're also marketing to the high refresh rate, low latency crowd though so you would think that efficiency at lower resolutions is also a priority.

What about utilization during RT workloads?

Pretty crap based on what I've seen so far but Nvidia said as much in the Ampere whitepaper. I've only looked at Cold War, Star Wars and Atomic Heart though. Overall SM utilization is usually somewhere around 20% during RT passes with the INT ALU pipe seeing lots of action.
 
It’s weird, I’m not seeing a whole lot of bandwidth usage on the 3090 in a few games I’ve looked at in the profiler. As in it doesn’t go over 10% at anytime during the entire frame. Hopefully I’m interpreting the stats wrong.

That could bode well for the 3080Ti with less bandwidth but potentially very similar or even greater core performance than the 3090.
 
That could bode well for the 3080Ti with less bandwidth but potentially very similar or even greater core performance than the 3090.

I'm discounting the VRAM bandwidth stats as they don't really make sense. There's another metric "L2 bandwidth to VRAM" that is probably more representative. In some cases it's almost maxed out which would indicate a bandwidth bound situation.

E.g. the HBAO pass in the Android mesh shader demo.

https://ibb.co/stTV6GJ
 
Shades of FuryX. Bondrewd(I think that was his SN) was on point after all. With how power limited Ampere is I wonder if higher utilization levels would even net much of a performance win. They may just result in large clock reductions as an offset.
 
Shades of FuryX. Bondrewd(I think that was his SN) was on point after all. With how power limited Ampere is I wonder if higher utilization levels would even net much of a performance win. They may just result in large clock reductions as an offset.

I don't even see how the rumor could point towards a consumer card. You'd need 4 stacks of HBM 2e for relative bandwidth there, that's just plain expensive. And as pointed out, it's not like anyone's 5nm is some magical node that halves power usage either. Are they going to try and make a 700 watt card? Quad slotted propelyne glycol/water mix cooler?
 
  • Like
Reactions: HLJ
I can't find my old nsight installer and had to install the latest version on a fresh windows. It's much more detailed now but it's surprise to see there's no FP16+Tensor pipe throughput anymore and FMA pipe throughput's divived to FMALite and FMAHeavy.

Can any Turing owner confirm if it's still show as FP16+Tensor pipeline on their nsight?

nsight_a_8gj0h.png


https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html
 
It's Nsight Graphics, but the excerpt is only available on the Nsight Compute documentation. Probably haven't updated on Nsight graphics documantation yet.

Yeah the documentation of the individual metrics in Nsight graphics is really poor. I also just noticed that there’s a “pro” build of Nsight that shows RT core activity that’s not available in the public version. Sucks.
 
Yeah the documentation of the individual metrics in Nsight graphics is really poor. I also just noticed that there’s a “pro” build of Nsight that shows RT core activity that’s not available in the public version. Sucks.

It's sad, it used to show DXR dispatch and DXR build sections under DXR marker at the bottom.

In the excerpt it says;
On GA10x, FMA is a logical pipeline that indicates peak FP32 and FP16x2 performance. It is composed of the FMAHeavy and FMALite physical pipelines.
Does that mean FMA heavy and FMA lite have physically different pipelines, like INT, FP64, Tensor ops on Ampere?
 
It's sad, it used to show DXR dispatch and DXR build sections under DXR marker at the bottom.

It still shows the BVH build and ray dispatch in the markers section but there’s no separate DXR row any more and no way to see RT core utilization. I’m not sure if the old DXR row was just a marker or actual throughput stats.

Does that mean FMA heavy and FMA lite have physically different pipelines, like INT, FP64, Tensor ops on Ampere?

FMA heavy seems to count FP instructions running on the FP+INT pipe on Ampere while FMA lite is the FP only pipe. I don’t see anything for Tensors in Nsight any more even when running DLSS.
 
I can't find my old nsight installer and had to install the latest version on a fresh windows. It's much more detailed now but it's surprise to see there's no FP16+Tensor pipe throughput anymore and FMA pipe throughput's divived to FMALite and FMAHeavy.
They do have prior nsight versions available to download if you recall the version number.

Yeah the documentation of the individual metrics in Nsight graphics is really poor. I also just noticed that there’s a “pro” build of Nsight that shows RT core activity that’s not available in the public version. Sucks.
There's a pro version? I thought the only version available was the one you got once registered.[/QUOTE]
 
If you substitute the version number in the link other archived versions are available.
i.e.
https://developer.nvidia.com/nsight-graphics-2020_6

I've tried both 2020-05 and 2020-06 already, these are the ones that support Ampere lineup. Prior version don't support. I wonder if the latest version gives different SM Instruction Throughput readup for Turing lineup, like still showing FP16+Tensor and/or unified FMA pipe throughput.
 

Rumored to have 18432 CUDA cores and 64 TFLOPS... 2022 is going to be an interesting year.


BTW the articles also list other NVidia codenames (supposedly leaked in the 2018 GPU Technology Conference keynote slide) which serve as tributes to renowed physicists and computer scientists:
Turing has already been used, while Lovelace and Hopper are rumored for 2022; the remaining 6 codenames should probably suffice until 2030 (so far they've only used 8 codenames since 2006 - Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere).

NVIDIA-GTC-2018-Heros-Ada-Lovelace.jpg
 
Last edited:
Back
Top