Nvidia Ampere Discussion [2020-05-14]

The problem is to undervolt you'll basically run a benchmark at 100% load and then slowly adjust your voltage vs frequency curve so it flattens out peak frequency to lower and lower voltages until you start getting errors. You'll never really know how well it's going to work, or how low you'll be able to go, especially on a new architecture. So you're probably going to be running 100% gpu for some time near stock voltage as you lower it a bit at a time.
With Nvidia's Power Limit capability, you can go in the opposite direction. You set the maximum board power you want to use, the video card will downclock itself to a stable level and then you tweak upwards.
 
With Nvidia's Power Limit capability, you can go in the opposite direction. You set the maximum board power you want to use, the video card will downclock itself to a stable level and then you tweak upwards.

Good point. I haven't seen anyone do it that way, but it might be interesting. I'm assuming power supply recommendations assume people will overclock, so maybe a 15-20% power reduction would work on a 650W.
 
Good point. I haven't seen anyone do it that way, but it might be interesting. I'm assuming power supply recommendations assume people will overclock, so maybe a 15-20% power reduction would work on a 650W.
I wonder if the beta Automatic Tuning feature could accomplish a similar result.
geforce-experience-new-performance-tuning-and-monitoring-options.png

https://www.nvidia.com/en-us/geforc...tform/#automatic-tuning-in-geforce-experience
 
I think AV1 is near so maybe that isn't really worthwhile now. Actually why don't they have AV1 encoding at this point? Heh

My understanding is that AV1 encoding is still rather immature at this stage as in there is still a lot of gains happening (as in several factors) in terms of performance and performance/quality just from improvements on how it's done. I'd guess real time AV1 encoding for something like a consumer GPU (rather transistor and power sensitive) likely doesn't make sense at this point (if even possible given practical constraints) due to the immaturity especially given the lack of usage.

I might be wrong with this with regards to VP9 but the interest is that I think Twitch (and possibly other streaming platforms?) might be looking to start implementing VP9 relatively sooner while wider AV1 adoption might not be until closer to 2025. Then again maybe h.264 encode improvements can outrace essentially the benefits of moving to VP9?

What does nvidia gain from providing reviewers with more accurate power testing equipment.
I assume reviewers will see what the variance is between the traditional method and the nvidia provided tool. It would be interesting to see if there is a significant difference.

Many (if not most?) reviewers, including rather notable ones, still use total system power consumption for power measurements. Also it's not always clear what they are actually measuring in terms of an "average" or whether or it not it's some peak figure, or if they have the capability of capturing data points beyond basically eyeballing a read out.
 
Re: RTX 3080 Compubench benchmarks

For instance, in the Vertex Connection and Merging test the RTX 3080 spotted by Apisak delivered a result of 39.042 mPixels/s (another tested sample recorded 39.128 mPixels/s – higher figure is better) compared to 24.621 mPixels/s (3080 first result: +59%) for the RTX 2080 Ti and 18.555 m/Pixels/s (3080 first result: +110%) for the RTX 2080.

In the more straightforward Ocean Surface Simulation test (simulating waves) the new Ampere unit managed 7768.469 Iterations/s, which was over 38% higher than the RTX 2080 Ti and a noteworthy 88% more than the RTX 2080 (see screenshots below).

In fact, in the Catmull-Clark Subdivision Level 5 benchmark, the Ampere card was even 60.78% faster than the RTX 2080 Ti.
https://www.notebookcheck.net/Nvidi...d-Big-Navi-a-hard-target-to-hit.492276.0.html
 
Good point. I haven't seen anyone do it that way, but it might be interesting. I'm assuming power supply recommendations assume people will overclock, so maybe a 15-20% power reduction would work on a 650W.
While doing F@H with a borrowed 2080 Ti, I did this all the time. Power-Limit to 90% or my 450W PSU would shut off when the card hit high load. With my similarly underrated (and now deceased) old PSU, i had to cap my Vega56 as well at 95%. edit: Yeah, with my next build, I won't cheap out on PSU-wattage any longer.
 
Last edited:
While doing F@H with a borrowed 2080 Ti, I did this all the time. Power-Limit to 90% or my 450W PSU would shut off when the card hit high load. With my similarly underrated (and now deceased) old PSU, i had to cap my Vega56 as well at 95%. edit: Yeah, with my next build, I won't cheap out on PSU-wattage any longer.

(And I love single big 12v rail, simpler to manage than multiples rails imo)
 
At this rate of Tensor logic investment, what are the chances that at some point in the future Nvidia will just fold all arithmetic ALUs in just more Tensor arrays?
The MMA programming model is already compliant with the standard grid/warp ordering of the conventional SIMT scheduling.
 
Last edited:
Or why not just sell separate cards that only have Tensors, so now gamers need to buy a GPU and a TPU in order to game?
 
Or why not just sell separate cards that only have Tensors, so now gamers need to buy a GPU and a TPU in order to game?
Not even the AI line of GPUs would do this. Standard compute is still very necessary, not all machine learning uses the same types of computation.
Even the Vega cards are very good at certain types of algorithms.
Throwing out compute in favour of tensor cores is unlikely to ever happen. You need compute flexibility.

Or to put another way; tensor cores accelerate a type of machine learning. But we are always developing new methods and algorithms. The need for flexible compute is the enabler for that.
 
Last edited:

Guess this cements it as an average other than raytracing titles.

Looking at the numbers, if we take transistor count as a measure the chip is 50% bigger than a 2080ti, and performs 40% better and a bit better than that on raytracing. The die sizes versus process shrink roughly match up, so despite all the changes to parallelization, performance per relative die size hasn't improved at all, it's just a bigger chip. Performance per watt is better but not even close to as huge a jump as claimed, with around 20% improvement over a ti, if the TDP rating for a 3080 is accurate. The exception here being raytracing performance, which does better apparently.

So other than raytracing performance Ampere doesn't seem like a huge jump over Turing in sheer engineering terms. Thankfully there's competition from AMD now to drive up a jump in benefits to consumers though, there it does well even above the Turing Super series.
 
Looking at the numbers, if we take transistor count as a measure the chip is 50% bigger than a 2080ti, and performs 40% better and a bit better than that on raytracing. The die sizes versus process shrink roughly match up, so despite all the changes to parallelization, performance per relative die size hasn't improved at all, it's just a bigger chip.
Memory bandwidth has only improved by 24% though.
And the RTX 3080 is more cut down than the RTX 2080 ti. 2 memory channels vs 1. 20% of SMs vs 5%. RTX 3090 vs Titan RTX is probably a better comparison.
 
GA104 is 392,5mm^2 and has 61% more transistors than TU106. RTX3070 will be around 70% faster than a 2060 Super in games while having the same bandwidth. Every transistor spent has result in the same performance increase. That is actually really good after the transistion from Pascal to Turing.
 
Memory bandwidth has only improved by 24% though.
And the RTX 3080 is more cut down than the RTX 2080 ti. 2 memory channels vs 1. 20% of SMs vs 5%. RTX 3090 vs Titan RTX is probably a better comparison.
+50% according to Nvidia so maybe a bit less when looking at independent review summaries.
 
Back
Top