Nvidia Ampere Discussion [2020-05-14]

Benetanegia · Oct 1, 2020

Digidi said:
The thing is that we dont have scaling also in FP32. Witcher uses less than 20% INT calculation but only gets 38% increase over the 2080ti in 4k

Source Witcher Benchmark 4k: https://www.guru3d.com/articles-pages/geforce-rtx-3090-founder-review,24.html
Sourcer Picture of INT Usage at Witcher: https://m.hexus.net/tech/reviews/gr...g-architecture-examined-and-explained/?page=2

There's a lot more to it than just FP or INT...

And for what it's worth it's 55% faster on TPU:

I don't know if it's system bottleneck for some strange reason or just scene selection, which is actually a huge difference that reviewers need to take into account and can't always get it right.

CarstenS · Oct 1, 2020

That's assuming, you are fully limited by FP32-throughput in the Witcher 3. You can only expect linear scaling with FP32 throughput when you're limited by it all the way. Apparently, here's some other limitations as well: RTX 2070 and 1080 perform identical.

DegustatoR · Oct 1, 2020

Digidi said:
The thing is that we dont have scaling also in FP32. Witcher uses less than 20% INT calculation but only gets 38% increase over the 2080ti in 4k

Well, let's see.
2080Ti has ~16 tflops FP32 at 1.8GHz boost.
Let's say that about 17% of TW3 math on it is handled by INT h/w - this results in ~18.7 tflops in Ampere metrics.
3080 @ 1.8GHz is about 31.3 tflops which is about 167% of 18.7.
So in the absolute best case of scaling here you should be getting +67% but it is closer to 3/5th of that on practice.
(And +55% from above is actually pretty close to +67% theoretical maximum.
Edit: Actually, scratch that, it's +32% for 3080 there, not +55% - which is for 3090 OC card.)

Why? Who knows. Maybe it's limited by memory bandwidth or the CPU or the data loads aren't fast enough for TW3 or something else.
This isn't that surprising for a game from 2015 running on DX11 really.

Scott_Arm said:
Does Witcher 3 support asynchronous compute?

Not in any official capacity since it's D3D11. But I doubt that it would be of much help - if it's already not compute limited then moving compute to async won't improve anything.

Scott_Arm · Oct 1, 2020

@DegustatoR Yah, I understand why older games get benchmarked if they're popular. People want to know that the game they play will run faster. But they're not a particularly good way to analyze newer gpu architectures in terms of scaling and performance.

Scott_Arm · Oct 1, 2020

tldr: There's nothing wrong with the capacitor configurations. The new driver has the same performance, but eliminates crashes. The clock frequency vs voltage curve is nearly identical and power consumption is nearly identical. There were minor tweaks, probably in the boosting algorithm so it wouldn't change quite as rapidly. The linux driver was always stable. It was the windows driver that had crashing problems.

trinibwoy · Oct 1, 2020

Digidi said:
The thing is that we dont have scaling also in FP32. Witcher uses less than 20% INT calculation but only gets 38% increase over the 2080ti in 4k.

Why would anyone expect games to scale perfectly with FP32? Did something happen recently where bandwidth, fillrate, geometry, texturing etc doesn't matter any more?

Digidi · Oct 1, 2020

@trinibwoy What you hear from the experts, everybody is saysing that we are havely shader bound. That's why i was suprised, that the real word scaling wasn't as good as data looked on the paper.

Scott_Arm · Oct 1, 2020

trinibwoy said:
Why would anyone expect games to scale perfectly with FP32? Did something happen recently where bandwidth, fillrate, geometry, texturing etc doesn't matter any more?

I think people expect that generation to generation gpus will increase performance in a particular ratio ie. if you double alu you also double rops and texture units. The problem is memory bandwidth. GPUs are going through what CPUs have been going through for a long time. Advances in processor performance are outpacing memory performance significantly. At some point gpu advancements are going to get very hard unless there's a memory breakthrough. The rops are high-bandwidth consumers, so it'll get harder to keep adding rops without faster memory. Maybe shaders start to get longer and more complex simple because writing short shaders will bottleneck other parts of the gpu. Right now game engines are transitioning away from object-oriented designs that are not cache friendly, purely to get around how slow memory is. I'm not as knowledgeable about how shaders tend to be written, but I imagine they're already largely performance focused in that way.

People expectations will have to adjust with the reality that future gpus will probably not scale the same way past gpus have.

DegustatoR · Oct 1, 2020

Digidi said:
@trinibwoy What you hear from the experts, everybody is saysing that we are havely shader bound. That's why i was suprised, that the real word scaling wasn't as good as data looked on the paper.

"Shader bound" isn't the same as "FP32 math bound" though. Shaders can be bandwidth limited and in case of simpler shaders from a 2015 engine this is the most likely scenario.

Scott_Arm · Oct 1, 2020

DegustatoR said:
"Shader bound" isn't the same as "FP32 math bound" though. Shaders can be bandwidth limited and in case of simpler shaders from a 2015 engine this is the most likely scenario.

Yah, my understanding is instruction cache is small so most games have short shaders as an optimization, which tends to lead to them being bandwidth bound. Ampere doubled L1 cache, but I'm not sure if that's data or both data and instruction.

Deleted member 13524 · Oct 1, 2020

trinibwoy said:
Why would anyone expect games to scale perfectly with FP32? Did something happen recently where bandwidth, fillrate, geometry, texturing etc doesn't matter any more?

Well in some console forums, these two apparently don't matter anymore.
j/k

CarstenS · Oct 1, 2020

Scott_Arm said:
tldr: There's nothing wrong with the capacitor configurations. The new driver has the same performance, but eliminates crashes. The clock frequency vs voltage curve is nearly identical and power consumption is nearly identical. There were minor tweaks, probably in the boosting algorithm so it wouldn't change quite as rapidly. The linux driver was always stable. It was the windows driver that had crashing problems.

Even if you go just a nuance above what the electrical design of a card can handle, it crashes. If you dial back that very nuance and keep the card inside it's safety margins, that means that you were too optimistic with the combination of your v/f curve and the cards electrical properties in the first place.

Good for Nvidia and their customers, that it apparently was just a nuance too much and they could fix it without percetible performance regression. That fact that some cards were more prone than others to crashing speaks to it, that there was an electrical problem in the first place and to what it was related to.

Rootax · Oct 1, 2020

CarstenS said:
Even if you go just a nuance above what the electrical design of a card can handle, it crashes. If you dial back that very nuance and keep the card inside it's safety margins, that means that you were too optimistic with the combination of your v/f curve and the cards electrical properties in the first place.

Good for Nvidia and their customers, that it apparently was just a nuance too much and they could fix it without percetible performance regression. That fact that some cards were more prone than others to crashing speaks to it, that there was an electrical problem in the first place and to what it was related to.

From what I've watched on Youtube, that's not sure I believe. Like some had Asus crashed a lot, FE, etc... So in the end I'm not sure that some models are more impacted than others. Maybe it was just the cards more sold / available...

Digidi · Oct 1, 2020

I think Nvidia hast lost against the silicon lottery?

A good write up for the power issue, sorry that it is only avaidable in german language.

https://www.igorslab.de/wundertreib...-gleich-noch-die-netzteile-verschont-analyse/

The card had 2 issues:

1. Silicon Lottery
2. High short peeks which triggert the safty function of the power adapter

DegustatoR · Oct 1, 2020

Rootax said:
From what I've watched on Youtube, that's not sure I believe. Like some had Asus crashed a lot, FE, etc... So in the end I'm not sure that some models are more impacted than others. Maybe it was just the cards more sold / available...

I think it's more than just a card model, PSUs and other system components play their role too here. Which is why some models crashed for some people while being rock stable for others.

Scott_Arm · Oct 1, 2020

Digidi said:
I think Nvidia hast lost against the silicon lottery?

A good write up for the power issue, sorry that it is only avaidable in german language.

https://www.igorslab.de/wundertreib...-gleich-noch-die-netzteile-verschont-analyse/

The card had 2 issues:

1. Silicon Lottery
2. High short peeks which triggert the safty function of the power adapter

Watch the hardware unboxed video. A crashing card would not crash in Linux. The windows driver had boosting behaviour issues that could cause power spikes that would crash the card. Changes to the frequency vs voltage curve are negligible. The cards are now stable in windows with corrections to the boosting behaviour with essentially zero performance loss.

Digidi · Oct 1, 2020

@Scott_Arm Linux driveres i think have less performancen than windows drivers. How many people use linux for a gaming card? I think we speeak about a low % rate. All benchmarks are done on Windows, Nvidia wanted to shine, that's why they got to the limit of the silicon.

And in this case i belave Igor much more than hardwareunboxed. Igor analyse everything wiht expensive test equipment, because he is a electrical engineer which do special work for electrical company he knows what he is talking about.

Deleted member 2197 · Oct 1, 2020

Digidi said:
@Scott_Arm Linux driveres i think have less performancen than windows drivers. How many people use linux for a gaming card? I think we speeak about a low % rate. All benchmarks are done on Windows, Nvidia wanted to shine, that's why they got to the limit of the silicon.

And in this case i belave Igor much more than hardwareunboxed. Igor analyse everything wiht expensive test equipment, because he is a electrical engineer which do special work for electrical company he knows what he is talking about.

Seeing is believing, I suggest you watch the video. At the time people had crashes someone using the quadro driver experienced no crashes.

Digidi · Oct 1, 2020

@pharma quadro drivers are not made for high performance. Quadro was alwys made for satbility.

At igorslab findings you can clearly see that they lowerd the peak of ther power consumtion and that they also lowerd the voltage/clock curve.

Scott_Arm · Oct 1, 2020

Digidi said:
@Scott_Arm Linux driveres i think have less performancen than windows drivers. How many people use linux for a gaming card? I think we speeak about a low % rate. All benchmarks are done on Windows, Nvidia wanted to shine, that's why they got to the limit of the silicon.

And in this case i belave Igor much more than hardwareunboxed. Igor analyse everything wiht expensive test equipment, because he is a electrical engineer which do special work for electrical company he knows what he is talking about.

All cards were affected to some degree, whether they used mlcc capacitors or not. They came up with a zero cost fix in software without making any noticeable adjustments to the voltage vs frequency curve. The issue was not seen in their linux drivers. This looks like they windows driver was pushing the boosting behaviour a little too far in windows. You design the software around the hardware, not the other way around.

Nvidia Ampere Discussion [2020-05-14]

Benetanegia

CarstenS

Moderator

DegustatoR

Scott_Arm

Scott_Arm

trinibwoy

Meh

Digidi

Scott_Arm

DegustatoR

Scott_Arm

Deleted member 13524

Guest

CarstenS

Moderator

Rootax

Digidi

DegustatoR

Scott_Arm

Digidi

Deleted member 2197

Guest

Digidi

Scott_Arm

Similar threads