Nvidia Ampere Discussion [2020-05-14]

Basing perf/w on TDP instead of actual measurements seems pointless too, especially since they're apparently reporting TGP instead of TDP. 2080 S, 2080 Ti and Titan RTX all had 250 TDP. Power consumption was not the same.

I didn't notice that. What was the TGP of a 2080 ti? TDP is listed at 250W.
 
Basing perf/w on TDP instead of actual measurements seems pointless too, especially since they're apparently reporting TGP instead of TDP. 2080 S, 2080 Ti and Titan RTX all had 250 TDP. Power consumption was not the same.

Not it matters anyway, im seeing 20-36TF gpus with 80 to 100% increase over Turing, aside from RT, dlss etc huge improvements, at good prices.
Indeed biggest leap ever. Even going from a 2080Ti to a 3070 is a huge upgrade, 2080 to 3080 even more so. Im sure we will see 3080Ti/s later on.
 
Not it matters anyway, im seeing 20-36TF gpus with 80 to 100% increase over Turing, aside from RT, dlss etc huge improvements, at good prices.
Indeed biggest leap ever. Even going from a 2080Ti to a 3070 is a huge upgrade, 2080 to 3080 even more so. Im sure we will see 3080Ti/s later on.
Wow Kool aid much?
 
Wow Kool aid much?

Just ignore them, I keep forgetting to.

Anyway, as for the architecture itself: This really seems to show the weakness of having a single arch going for machine learning/compute and gaming at the same time. The IPC has dropped dramatically, probably bottlenecked by lack of SRAM and/or bandwidth to memory, showing the fundamental arch isn't designed around graphics but instead looks more like AMD's own CDNA split, concentrated on compute without the need for a giant amount of cache to go with it. Really it kind of looks like Volta 2.0 adapted to compete with RDNA2, though so far it seems to be doing a decent if not perfect job of it. And Nvidia surely has a lot of resources, I wouldn't be surprised if they split their architectures soon enough as well.
 
Basing perf/w on TDP instead of actual measurements seems pointless too, especially since they're apparently reporting TGP instead of TDP. 2080 S, 2080 Ti and Titan RTX all had 250 TDP. Power consumption was not the same.
Nvidia has always used TGP. 2080 TI has a TGP (per Nvidia) of 250 watts and a 3070 of 220.
 
Last edited:
Not it matters anyway, im seeing 20-36TF gpus with 80 to 100% increase over Turing, aside from RT, dlss etc huge improvements, at good prices.
Indeed biggest leap ever. Even going from a 2080Ti to a 3070 is a huge upgrade, 2080 to 3080 even more so. Im sure we will see 3080Ti/s later on.

I dunno bro. You think its really bigger than stg 2000 to Riva 128 ?
 
Anyway, as for the architecture itself: This really seems to show the weakness of having a single arch going for machine learning/compute and gaming at the same time. The IPC has dropped dramatically, probably bottlenecked by lack of SRAM and/or bandwidth to memory, showing the fundamental arch isn't designed around graphics but instead looks more like AMD's own CDNA split, concentrated on compute without the need for a giant amount of cache to go with it.

It's probably premature before seeing an SM diagram, but assuming the "slapped another fp32 SIMD in there" rumour is true, we're seeing IPC go down as a direct result of changes between gaming and HPC ampere. One math instruction issued per clock, 3 SIMDs (int32, fp32, fp32) that each take 2 clocks - there is an obvious bottleneck there.
 
Consoles are more and more resembling PC configs, and PC hardware/APIs come closer to console efficiency (DX12_2, RTX IO). So it's less of an issue than in the past.

I think devs on this very forum have stated that "low level" PC APIs are not even close to what is available on consoles, particularly PS4. There are several facets of console design that allow a level of efficiency the PC can never match. I dont think we can claim anything about RTX IO at this point in time.
 
I think devs on this very forum have stated that "low level" PC APIs are not even close to what is available on consoles, particularly PS4. There are several facets of console design that allow a level of efficiency the PC can never match. I dont think we can claim anything about RTX IO at this point in time.

Of course consoles are still more efficient, but the gap has come closer to the point where Carmack's old tweet about "2x raw power to close the gap" is no longer valid.
 
I think this is where the cheaper Samsung 8nm process comes to play. I believe that they went with 7nm TSMC, they PPW would be better, but at 799 instead of 699. Which would you choose?

The difference in die cost wouldn't be nearly that high, and remember that it's partially offset by the increased PCB, VRM and cooling solution costs.

One overlooked reason I think Nvidia went with Samsung is simply wafer capacity. Until Huawei's recent issues, TSMC 7nm was at full capacity (and with demand even higher, especially with this year's consoles). Given how much the size of the gaming market has increased in recent years, and the market share of Nvidia, TSMC may not have had the volume required by Nvidia (This is a similar problem faced by AMD with Epyc if you wonder why market share is so low).

Now would price have played a part too? Sure no doubt, NV likes their margins. And did Samsung really fuck up "7N" that badly that they had to go back to 8nm and derailed the plan a bit? Maybe. I find it hard to believe they actually planned on using a 10nm derivative in late 2020.
 
It's probably premature before seeing an SM diagram, but assuming the "slapped another fp32 SIMD in there" rumour is true, we're seeing IPC go down as a direct result of changes between gaming and HPC ampere. One math instruction issued per clock, 3 SIMDs (int32, fp32, fp32) that each take 2 clocks - there is an obvious bottleneck there.
This gen Nvidia flops became AMD flops and viceversa.
 
At the moment it is not yet completely clear, but the doubling of FP32 has been done by simply adding a SIMD to a SM, so two SIMDs share the same scheduling, and so on. Apart the obvious pressure on the register files, and so on, this may lead to utilization issues n the case the scheduler, for various reasons, cannot co-issue independent instructions to both SIMDs at the same time.
 
I dont think we can claim anything about RTX IO at this point in time.

We can claim exactly the same as what is claimed for the console IO systems. Until we have independent benchmarks, both are simply advertised capabilities from the manufacturers.

There's no technical reason at this stage to doubt Nvidias claims any more than there is to doubt Sonys or Microsofts.
 
Back
Top