Nvidia Ampere Discussion [2020-05-14]

Deleted member 90741 · Sep 2, 2020

Glad lot of memes were put to rest
- Gaming Ampere 7nm
- NVCache
- RT Coprocessor on the back
- DLSS 3.0
- Tensor memory compression

Erinyes · Sep 2, 2020

pjbliverpool said:
We can claim exactly the same as what is claimed for the console IO systems. Until we have independent benchmarks, both are simply advertised capabilities from the manufacturers.

There's no technical reason at this stage to doubt Nvidias claims any more than there is to doubt Sonys or Microsofts.

Call me skeptical but I'm doubtful tensor cores can offer the same decompression performance as dedicated, fixed function units. Not to mention varying performance across the GPU lineup (How will GA106/107 do especially)

But yes, independent benchmarks are required of course.

PSman1700 · Sep 2, 2020

Malo said:
Wow Kool aid much?

No idea why theres a need for going personal.
Theres a rather huge uplift going from a 2080 to a 3080, a 80% increase in performance as per DF testing in pure rasterization. Which means the doubling of TF fits from 2080 to 3080. Thats aside from all other improvements, but they also didnt go haywire with the prices (even though i still think their too high).
Where in a range of 20 to 36TF of performance, i think thats quite the leap, even a 2080Ti looks kinda old by now.

pjbliverpool · Sep 2, 2020

ethernity said:
Glad lot of memes were put to rest
- Gaming Ampere 7nm
- NVCache
- RT Coprocessor on the back
- DLSS 3.0
- Tensor memory compression

Yeah Moores Law is Dead really got shown up on this one. Virtually everything he predicted was wrong. I still hold out hope for a DLSS 3.0 at some point though that will be more game agnostic. There's no real reason why it would need to be tied to the hardware launch and might actually be more useful to unveil at a later date. Say a couple of days before the RDNA2 launch. I do realise of course that this is likely just wishful thinking.

xpea · Sep 2, 2020

techuse said:
I think devs on this very forum have stated that "low level" PC APIs are not even close to what is available on consoles, particularly PS4. There are several facets of console design that allow a level of efficiency the PC can never match. I dont think we can claim anything about RTX IO at this point in time.

And for what result ?
When this gen of consoles will hit the market, they will be equivalent to a entry/mid level gaming PC (RTX3060 + Zen3). In 2 years, it will barely match an entry level PC (RTX40 + Zen4). And in 4 years / mid life of this gen, everyone will complain about how slow are these consoles and how they cripple gaming PC with low tech ports and outdated tech... Rinse and repeat until Sony/MS announce next gen consoles...
True is that consoles can't create any sufficient tech leap to be technically relevant for their entire life time. PC gaming is always ahead, no matter how you look at it (except when consoles are announced and compared to last gen hardware)

xpea · Sep 2, 2020

pjbliverpool said:
Yeah Moores Law is Dead really got shown up on this one. Virtually everything he predicted was wrong. I still hold out hope for a DLSS 3.0 at some point though that will be more game agnostic. There's no real reason why it would need to be tied to the hardware launch and might actually be more useful to unveil at a later date. Say a couple of days before the RDNA2 launch. I do realise of course that this is likely just wishful thinking.

For the record, DLSS3.0 is on the way, I saw it. NV just launch DLSS 2.1 SDK because some 3.0 features are not ready yet. But DLSS 3.0 is well alive and coming.
8/7nm is a crazy story. I saw until last week NV launch slides from their sales team with 7nm on it !!!

PSman1700 · Sep 2, 2020

xpea said:
When this gen of consoles will hit the market, they will be equivalent to a entry/mid level gaming PC (RTX3060 + Zen3).

3070 sits at 20TF, no idea where the 3060 will be though, but most likely well above consoles, aside from dlss/RT advantages.
Zen3 seems intresting, amd does a great job with their cpus now.

pjbliverpool · Sep 2, 2020

Erinyes said:
Call me skeptical but I'm doubtful tensor cores can offer the same decompression performance as dedicated, fixed function units. Not to mention varying performance across the GPU lineup (How will GA106/107 do especially)

But yes, independent benchmarks are required of course.

Assuming it is done on the tensor cores (which makes a lot of sense) you're looking at over 50 TFLOPS of FP16 and over 100 TOPS of INT8 on an RTX 2060 alone. I don't see it as unrealistic that that would enable it to match or exceed what is likely a cheap hardware decompression block.

Also, since Jenson specifically said they could exceed the output of a 7GB/s NVMe drive it would mean that he was outright lying as opposed to just presenting some slightly misleading presentation material.

Janne Kylliö · Sep 2, 2020

I don't know... It seems that the RTX 3080 FE has roughly 3 times the theoretical performance of the RTX 2080 FE (30TF vs. 10.6TF), but manages only outperform it by 80% in real gaming benchmarks. Maybe it's a bandwith issue, maybe a scheduling issue, who knows?

But it also means that if the XSX GPU has performance advantage over the 2080, the performance difference between the XSX and the 3080 won't be close to 2X either. And the rumoured 80 CU Big Navi might actually match or exceed the 3080 performance (non-DLSS, non RTX).

However, it remains to see if AMD will provide similar functionality to DLSS 2.0 or have RTX performance as good. If not, then the NV is a clear winner here.

troyan · Sep 2, 2020

Qesa said:
It's probably premature before seeing an SM diagram, but assuming the "slapped another fp32 SIMD in there" rumour is true, we're seeing IPC go down as a direct result of changes between gaming and HPC ampere. One math instruction issued per clock, 3 SIMDs (int32, fp32, fp32) that each take 2 clocks - there is an obvious bottleneck there.

IPC doesnt go down. Ampere can process more instruction than Turing...

Leoneazzurro5 · Sep 2, 2020

The right term here may be "utilization"

Qesa · Sep 2, 2020

troyan said:
IPC doesnt go down. Ampere can process more instruction than Turing...

I'm assuming "IPC" was intended to mean "realised performance per peak fp32 throughput"

Geeforcer · Sep 2, 2020

It seems that finally, after all these years, Nvidia may have the price/performance cards capable of taking on their greatest adversary that has haunted and thwarted them all this time...
...
...
...
2018 $350 cryptomine liquidation 1080TI.

troyan · Sep 2, 2020

Qesa said:
I'm assuming "IPC" was intended to mean "realised performance per peak fp32 throughput"

Yes, but does it matter? Transistors are cheap, power consumption isnt. Why not fill the whole die with fp32 units?

Qesa · Sep 2, 2020

troyan said:
Yes, but does it matter? Transistors are cheap, power consumption isnt. Why not fill the whole die with fp32 units?

I didn't mean to put any value judgement on whether decreasing "IPC" (or as leonazzuro suggests, utilisation is a far better term) is good or bad. Simply trying to point out why the added SIMD would make it go down.

DegustatoR · Sep 2, 2020

Love_In_Rio said:
This gen Nvidia flops became AMD flops and viceversa.

That's doubtful as well so far.
"AMD flops" had inherent utilization issues in how GCN scheduling worked, no matter how much math you were pushing at them there were issues with wavefront widths and context switching bubbles.
Ampere flops may look kinda similar from utilization point of view on older s/w but the actual reason for that can be their underutilization due to s/w being limited by some other part of the pipeline (rasterization, bandwidth, etc) not because the Ampere multiprocessors are having issues with keeping the FP32 units utilized. This means that Ampere's FP32 utilization may be the same as on Turing and RDNA1/2 on the code which is predominantly FP32 limited - and this will likely be exactly the type of code where performance will matter the most.
We have to see the details on the reorganized Ampere SMs and how they reach the 2x FP32 on them.

xpea said:
For the record, DLSS3.0 is on the way, I saw it. NV just launch DLSS 2.1 SDK because some 3.0 features are not ready yet. But DLSS 3.0 is well alive and coming.
8/7nm is a crazy story. I saw until last week NV launch slides from their sales team with 7nm on it !!!

I'm sure that DLSS 3.0 is coming. I'm also sure that it won't be anything like what MLID was talking about.

DavidGraham · Sep 2, 2020

Seems NVIDIA caught wind of the Xbox Series X methodology of calculating RT performance, Microsoft said that the RT acceleration of Series X is equivalent to 13TF of compute, for a total of 25TF of compute across both the shaders and RT cores.

Jensen took the hint and declared that according to that Xbox Series X methodology, the 2080 RT cores have the equivalent of 34TF of compute, in addition to another 11TF of compute, for a total of 45TF while ray tracing, which is 80% faster than Series X.

For Ampere, the 3080 alone delivers the equivalent of 58TF from the RT cores, not taking into account the other 30TF of regular compute, which amounts to a crazy 88TF while ray tracing.

Love_In_Rio · Sep 2, 2020

DegustatoR said:
That's doubtful as well so far.
"AMD flops" had inherent utilization issues in how GCN scheduling worked, no matter how much math you were pushing at them there were issues with wavefront widths and context switching bubbles.
Ampere flops may look kinda similar from utilization point of view on older s/w but the actual reason for that can be their underutilization due to s/w being limited by some other part of the pipeline (rasterization, bandwidth, etc) not because the Ampere multiprocessors are having issues with keeping the FP32 units utilized. This means that Ampere's FP32 utilization may be the same as on Turing and RDNA1/2 on the code which is predominantly FP32 limited - and this will likely be exactly the type of code where performance will matter the most.
We have to see the details on the reorganized Ampere SMs and how they reach the 2x FP32 on them.

I'm sure that DLSS 3.0 is coming. I'm also sure that it won't be anything like what MLID was talking about.

Well, now we have 13,45 Tflops (2080TI) behaving like 20 Tflops (3070). So flops are not equal, and a possible 20 tflops RDNA2 now will trounce a 20 tflops 3070.

chris1515 · Sep 2, 2020

techuse said:
Are people really denying that consoles are more efficient? Its pretty much a fact.

A dev CorralX on another forum estimate the efficiency to only around 10% with DX12 and Vulkan on PC side.

https://twitter.com/Corralx

DegustatoR · Sep 2, 2020

Love_In_Rio said:
Well, now we have 13,45 Tflops (2080TI) behaving like 20 Tflops (3070). So flops are not equal, and a possible 20 tflops RDNA2 now will trounce a 20 tflops 3070.

2080Ti actual boost flops are closer to 16.5 TFs.
3070 is said to be "faster" than 2080Ti, not "like" it.
So yeah flops may well end up being equal and the seemingly lower utilization may be a result of bandwidth or some other limitations coming into play on older s/w.
Will 20 tflops RDNA2 card "trounce" the 20 tflops 3070? Possibly, sometimes. Universally? Doubtful. And I'm not even accounting for DLSS here.
Note that I expect Navi 21 to be higher than 20 tflops in actual shipping products. This one will likely be universally faster than 3070, of course.

Nvidia Ampere Discussion [2020-05-14]

Deleted member 90741

Guest

Erinyes

PSman1700

pjbliverpool

B3D Scallywag

xpea

xpea

PSman1700

pjbliverpool

B3D Scallywag

Janne Kylliö

troyan

Leoneazzurro5

Qesa

Geeforcer

Harmlessly Evil

troyan

Qesa

DegustatoR

DavidGraham

Love_In_Rio

chris1515

DegustatoR

Similar threads