Nvidia Ampere Discussion [2020-05-14]

w0lfram · Nov 5, 2020

troyan said:
GA104 has 35% less transistors and the 3070 will be ~30% slower than the 6900XT. Doesnt look worse than Navi21...

How much slower is the GA-104 to the GA-102..?

troyan · Nov 5, 2020

The same. 3090 is ~47% faster than the 3070 in 4K.

trinibwoy · Nov 5, 2020

w0lfram said:
Turing and Ampere don't seem efficient at gaming, as RDNA. And I don't mean in power, but in terms of performance, per transistor.

In rasterized games that’s probably true but it’s not a very useful metric today. Both Ampere and RDNA2 are forward looking architectures.

LiXiangyang · Nov 10, 2020

Has anyone tested 3090 yet?

Some Machine-Learning programmers in China report disappointing Tensor-Core performance for 3090s, reporting basically no performance gain over Turing at tensor core performance, and sometimes even slower than the latter.

Entropy · Nov 10, 2020

trinibwoy said:
In rasterized games that’s probably true but it’s not a very useful metric today. Both Ampere and RDNA2 are forward looking architectures.

Looking forward to what future exactly?
A future where lithographic advances solve all your efficiency issues?
Or a future where people gravitate away from mobile devices towards being plugged into walls again?
Besides, devices sold needs to be efficient at running existing code. Industry transformations are always slow, but in graphics they have largely been driven by lithographic advances. These two architectures are launched with chip sizes close to the reticle limit, and above 200W for a reason. They had nowhere else to go.
I don’t see this as a sign of health, or a promising direction for the future, at least as far as consumer oriented products are concerned.

trinibwoy · Nov 10, 2020

Entropy said:
Looking forward to what future exactly?
A future where lithographic advances solve all your efficiency issues?
Or a future where people gravitate away from mobile devices towards being plugged into walls again?
Besides, devices sold needs to be efficient at running existing code. Industry transformations are always slow, but in graphics they have largely been driven by lithographic advances. These two architectures are launched with chip sizes close to the reticle limit, and above 200W for a reason. They had nowhere else to go.
I don’t see this as a sign of health, or a promising direction for the future, at least as far as consumer oriented products are concerned.

My comment didn’t have anything to do with manufacturing technology. Any evaluation of efficiency has to be qualified with the workload in question. Looking at performance per transistor of Ampere and RDNA2 in current games is sorta pointless when those games aren’t engaging a lot of those transistors.

Manufacturing tech is a separate concern independent of where transistors are spent.

Entropy · Nov 10, 2020

trinibwoy said:
My comment didn’t have anything to do with manufacturing technology.

These are real world products. Their construction, the code they run, their efficiency at doing so is interdependent.

Any evaluation of efficiency has to be qualified with the workload in question. Looking at performance per transistor of Ampere and RDNA2 in current games is sorta pointless when those games aren’t engaging a lot of those transistors.

If so, it simply means that those transistors are largely a waste of resources. If you use your Turing cards for gaming, how much use have you gotten out of those general tensor cores? We are all aware that they served a purpose outside gaming for Nvidia, but out of my 200 game library, none use them in any capacity, nor do any upcoming game I’m interested in. Still, they have to be paid for, in die area, yields, power, cost. They will arguably never pay for themselves over the lifetime of the product. Features that are underutilized is the very definition of waste and inefficiency.

Manufacturing tech is a separate concern independent of where transistors are spent.

I don’t agree. As I remarked above, though intellectually seperable, they are in reality interdependent. While in the past the industry have been able to overspend transistors and rely on lithographic advances to make it possible for new tech to reach mainstream markets in relatively short time, that is not the case today, both because benefits of lithographic advances have slowed down tremendously, and because the frontline of graphics features resides in 200+W products, when consumers move to ever more mobile platforms.

And that’s where my reaction comes in - driving graphics technology in directions that are unsuited to low power, low cost applications is OK, but it also implies a lack of future market penetration that was not the case a decade or two ago. Fundamental premises have changed, and I find that disconnect to be a general problem in this forum. (But then I’m not a graphics professional, and this is the architecture forum, so this is where tech for its own sake belongs. Then again, if we are talking about consumer product testing...)

Erinyes · Nov 10, 2020

pharma said:
I'm not so sure as AMD seems to need a node advantage to remain marginally competitive. Would the existing lineup be as competitive if on the same node?

I'd agree that AMD wouldn't be as competitive if not for 7nm. However, the fact that they are this competitive is still a tremendous achievement and something I'm sure none of us really expected given where they were with GCN barely 2 years ago. RDNA seems to be somewhat following the Zen progression, where RDNA brought them somewhat into contention, and then RDNA2 very much into contention. If they can pull of a Zen3 type update, RDNA3 should again be a good step forward (with the benefit of either a minor or full node update). Nvidia is executing better than Intel though so I expect the next gen from them to be significantly better, especially with a more competitive node. It's definitely good to see competition in both the CPU and GPU segment again.

troyan said:
GA104 has 35% less transistors and the 3070 will be ~30% slower than the 6900XT. Doesnt look worse than Navi21...

I'd expect the 3070 to be a bit more than 30% slower, but either ways the transistor count comparison is a bit skewed due to the infinity cache so it's best not to compare.

xpea · Nov 10, 2020

Entropy said:
If so, it simply means that those transistors are largely a waste of resources. If you use your Turing cards for gaming, how much use have you gotten out of those general tensor cores? We are all aware that they served a purpose outside gaming for Nvidia, but out of my 200 game library, none use them in any capacity, nor do any upcoming game I’m interested in. Still, they have to be paid for, in die area, yields, power, cost. They will arguably never pay for themselves over the lifetime of the product. Features that are underutilized is the very definition of waste and inefficiency.

IN YOUR WORKLOADS
Modern GPUs are not only graphic cards, they are now primarily compute monsters. Millions of people find Tensor core useful and they take advantage of it. Today Nvidia sales more to the datacenter than to gamers. Just to remind you...

OlegSH · Nov 10, 2020

Entropy said:
They will arguably never pay for themselves over the lifetime of the product.

They already paid quite well for themselves with DLSS.
Moreover, modern GPU usage is not limited just by games, these cores accelerate many pro apps features, why should anyone buy GPUs without these cores and acceleration in pro apps that's the question to ask.

xpea · Nov 10, 2020

LiXiangyang said:
Has anyone tested 3090 yet?

Some Machine-Learning programmers in China report disappointing Tensor-Core performance for 3090s, reporting basically no performance gain over Turing at tensor core performance, and sometimes even slower than the latter.

unfortunately, I heard that Nvidia has restricted Tensor cores performance on GeForce range. You need a Quadro/tesla or whatever they call the RTX30 pro line to get the full unlocked Tensor performance. I'm not 100% sure, don't quote me on that, but knowing Nvidia, it's highly possible...

OlegSH · Nov 10, 2020

xpea said:
I heard that Nvidia has restricted Tensor cores performance on GeForce range

GeForce GPUs have 1/2 throughput for mixed precision with FP32 accumulation, for all other regimes they have full throughput.

OlegSH · Nov 10, 2020

LiXiangyang said:
Some Machine-Learning programmers in China report disappointing Tensor-Core performance for 3090s, reporting basically no performance gain over Turing at tensor core performance, and sometimes even slower than the latter

Ampere iterates on the structured sparse matrix feature. In order to leverage the benefits of structured sparsity, the network has to be trained with this feature in mind, otherwise gains will be proportional to throughput gains without this feature.
These gains are quite small (proportional to SM count * frequency gains/losses) and can be eclipsed by Titan RTX on Turing, which has higher throughput in mixed precision calculations (by design).

troyan · Nov 10, 2020

Erinyes said:
I'd expect the 3070 to be a bit more than 30% slower, but either ways the transistor count comparison is a bit skewed due to the infinity cache so it's best not to compare.

AMD needs the cache for the performance and efficiency.

SimBy · Nov 10, 2020

troyan said:
AMD needs the cache for the performance and efficiency.

Consoles don't have IC. And they are both very efficient and fast.

Jawed · Nov 10, 2020

troyan said:
AMD needs the cache for the performance and efficiency.

Do you think NVidia will use a large cache on consumer GPUs?

HLJ · Nov 10, 2020

SimBy said:
Consoles don't have IC. And they are both very efficient and fast.

Fast is not a word I would use about consoles
Cheap yes.
Practical yes.
Balanced yes.

But fast...no.

pjbliverpool · Nov 10, 2020

SimBy said:
Consoles don't have IC. And they are both very efficient and fast.

Compared to what? Last gen? Sure. But compared to a similarly scaled RDNA2 GPU that also has IC? That's a question we don't have an answer to yet. Hopefully someone will do some like for like game comparisons vs an underclocked 6800 to give us some idea of exactly how much extra performance IC is bringing to the table.

SimBy · Nov 10, 2020

HLJ said:
Fast is not a word I would use about consoles
Cheap yes.
Practical yes.
Balanced yes.

But fast...no.

I should have said perf/W seems to be just fine without IC. Fast is obviously relative. I'm getting off topic here so I'll stop.

trinibwoy · Nov 10, 2020

Entropy said:
If so, it simply means that those transistors are largely a waste of resources. If you use your Turing cards for gaming, how much use have you gotten out of those general tensor cores? We are all aware that they served a purpose outside gaming for Nvidia, but out of my 200 game library, none use them in any capacity, nor do any upcoming game I’m interested in. Still, they have to be paid for, in die area, yields, power, cost. They will arguably never pay for themselves over the lifetime of the product. Features that are underutilized is the very definition of waste and inefficiency.

I think you misunderstand my point. I'm not talking about tensors. I'm talking about forward looking industry standard features, VRS, RT, mesh shaders etc that are not in widespread usage today. I assume that dumping even more transistors into the classic rasterization pipeline is a dead end.

I don’t agree. As I remarked above, though intellectually seperable, they are in reality interdependent. While in the past the industry have been able to overspend transistors and rely on lithographic advances to make it possible for new tech to reach mainstream markets in relatively short time, that is not the case today, both because benefits of lithographic advances have slowed down tremendously, and because the frontline of graphics features resides in 200+W products, when consumers move to ever more mobile platforms.

Yes, but at the end of the day lithography gives you transistors and power efficiency. It doesn't directly dictate how you spend those transistors. To be honest I'm not really following what you're saying. Is there a specific feature set that you think should be prioritized with mobile devices in mind?

And that’s where my reaction comes in - driving graphics technology in directions that are unsuited to low power, low cost applications is OK, but it also implies a lack of future market penetration that was not the case a decade or two ago. Fundamental premises have changed, and I find that disconnect to be a general problem in this forum. (But then I’m not a graphics professional, and this is the architecture forum, so this is where tech for its own sake belongs. Then again, if we are talking about consumer product testing...)

Fair enough though for me personally I couldn’t care less about low power applications since I don't game on low power devices. I would be perfectly fine with leaving those devices behind if it means we can make real progress in graphics fidelity at the high end.

Nvidia Ampere Discussion [2020-05-14]

w0lfram

troyan

trinibwoy

Meh

LiXiangyang

Entropy

trinibwoy

Meh

Entropy

Erinyes

xpea

OlegSH

xpea

OlegSH

OlegSH

troyan

SimBy

Jawed

HLJ

pjbliverpool

B3D Scallywag

SimBy

trinibwoy

Meh

Similar threads