Nvidia Ampere Discussion [2020-05-14]

Maybe a stupid question, but , is the driver a key component to "fully" utilize the double fp32 units (when it's not used for int32), or it's mostly a hardware thing ?
 
Maybe a stupid question, but , is the driver a key component to "fully" utilize the double fp32 units (when it's not used for int32), or it's mostly a hardware thing ?
The driver contains the shader compiler. So it can make a difference. We could expect that NVidia will improve the shader compiler. On the other hand shader compilation is something you can refine years ahead of the silicon arriving.

I found an analysis I did 12 years ago on the compilation of Perlin Noise. Interestingly, on AMD's VLIW-5 GPUs, the utilisation was about 89%. G80 was about 5% more efficient.. This means that instruction dependency was not that significant.

So that makes me even more puzzled why Ampere is "slow". The texturing workload is not substantial, so I don't believe that's relevant.
 

Looks like you can get a very minor overclock in the 50-70 MHz range with a slight undervolt, which actually beats trying to overclock with a +100MHz core offset on this Gigabyte card. I think the best play will be to set the power limit as high as the BIOS allows in msi afterburner or evga precision x1 and then find the highest frequency you can maintain under the power limit with undervolting. 100% stable clock is much better than having the clock jumping around. Interesting that there are some comments saying ray tracing is more sensitive to undervolting and you may find stable undervolts for raster games that crash in ray-traced games. It'll probably be necessary to use something like Port Royal to check overclock, undervolt results.
 
Nvidia releases a statement on the disastrous launch and promises to do better.

https://www.nvidia.com/en-us/geforce/news/rtx-3080-qa/

We began shipping GPUs to our partners in August, and have been increasing the supply weekly.

So how long does it take to 1st partners actually get the chips, 2nd partners make the cards, 3rd partners ship those cards to retailers around the world? Shipping anything around the world right now is a nightmare.

I don't see any real supply until 21. Same goes for AMD probably.
 
RTX 3000 are undoubtly the most powerful cards at the market. Unlike the Vegas.

However, statements like "wait for the games to catch up" or "this is not the full potential" bring back memories. HD 2900 was like: "This is a DX11 card, wait for games to catch up!". GTX 480 was: "Wait for games to finally utilize all the geometry stuff!". Vega was like: "Wait for the drivers to utilize DSBR, NGG, HBCC and games to use FP16!"...
 
I think the difference is all of the features for Ampere are standardized in DirectX Ultimate and are available on Xbox Series X. You don't have to optimize for Ampere specifically. It's just the standard feature set for D3D. Example: Mesh Shaders will leverage the compute power of Ampere, Xbox Series X and the upcoming RDNA2 gpus.
 
I've ordered a 3080 TUF from overclockers uk within the first 3 hours and I'm probably going to get it in November (if even that), worst product launch I've witnessed the past decade :LOL:

At least I can cancel and get an RDNA2 GPU if AMD delivers the goods.
 
Last edited:
However, statements like "wait for the games to catch up" or "this is not the full potential" bring back memories. HD 2900 was like: "This is a DX11 card, wait for games to catch up!". GTX 480 was: "Wait for games to finally utilize all the geometry stuff!". Vega was like: "Wait for the drivers to utilize DSBR, NGG, HBCC and games to use FP16!"...
Yep, except HD2900 was a DX10 card.
 
From the ixbit review (BTW very welcome to see this kind of low level feature benchmarking again, brings back memories of hardware.fr), the TMUs reportedly have been upgraded, doubling texel read speed, that is when not using filtering. These kind of TMU reads are often used in compute shaders. That is pretty cool.
 
Last edited:
Why doesn't GA100 have 128x FP32 per SM like GA102? Why does it only have 64? For a compute card that seems like a major omission.
 
Why doesn't GA100 have 128x FP32 per SM like GA102? Why does it only have 64? For a compute card that seems like a major omission.
Probably just die size reasons. NVIDIA couldn't make the die bigger even if they really wanted to, so they'd have to cut out other parts to do it. As for why not reduce SM count to fit double ALU per SM, balance of resources is the logical answer here.
 
Another possible scenario is that GA100 was made considerably earlier than GA10x and the updated FP32/INT h/w wasn't ready for it. We've seen something similar between Volta and Turing previously.
 
Back
Top