Nvidia Ampere Discussion [2020-05-14]

trinibwoy · Aug 29, 2020

Scott_Arm said:
I have a feeling ampere will show great improvements in RT performance, especially with DLSS, but performance in non-DXR/RTX games will not improve as greatly.

With 60% higher bandwidth? I think that’s really unlikely.

Scott_Arm · Aug 29, 2020

trinibwoy said:
With 60% higher bandwidth? I think that’s really unlikely.

That's a pretty big bump in bandwidth, no? Bandwidth also increase more than shader cores. I'm curious to see how the cache changed, if it changed much at all.

2080 Super 496 GB/s -> 3080 760 GB/s (53% increase)
2080 Super 3072 cuda cores -> 3080 4352 cuda cores (41.6% increase)

CarstenS · Aug 29, 2020

Scott_Arm said:
That's a pretty big bump in bandwidth, no? Bandwidth also increase more than shader cores. I'm curious to see how the cache changed, if it changed much at all.

2080 Super 496 GB/s -> 3080 760 GB/s (53% increase)
2080 Super 3072 cuda cores -> 3080 4352 cuda cores (41.6% increase)

19 FLOPS/Byte is nothing out of this world. TU106 had similar characteristics and also TU104/Navi10 were not far away with 22. Compared to TU102 with 24 it's a bit larger gap, I'll give you that.

DavidGraham · Aug 29, 2020

7nm confirmed for good.
https://videocardz.com/newz/gainward-geforce-rtx-3090-and-rtx-3080-phoenix-leaked-specs-confirmed

Jawed · Aug 29, 2020

trinibwoy said:
Looking good. And only 2 slots too.

What is the use case of a 2-slot card versus say 3-slots? Are you thinking specifically of a small form factor case that literally constrains a graphics card to 2-slots?

If not, what else?

Scott_Arm said:
That's a pretty big bump in bandwidth, no? Bandwidth also increase more than shader cores. I'm curious to see how the cache changed, if it changed much at all.

2080 Super 496 GB/s -> 3080 760 GB/s (53% increase)
2080 Super 3072 cuda cores -> 3080 4352 cuda cores (41.6% increase)

If the tensor ALUs can be used for graphics shading, then these numbers are deceptive. At the same time, I expect tensor ALUs have very heavy constrictions on what instructions can be issued (ADD, MUL and MAD) and how they're sequenced (dependencies).

Beyond that, I think we should expect "second generation" real time ray tracing to get a massive boost in performance. For example, there might be large benefits in moving ray queries around the GPU, so that they follow the data, rather than trying to get all the data to all the rays. This is purely my speculation, but I'd like to compare this with how NVidia fully parallelised geometry processing, which was a revolution for tessellation. And again with tile-based rasterisation. And again with render target compression.

Hardware algorithms to speed up well-defined bandwidth-eating monsters are the entire reason we have such nice graphics.

Bandwidth is always the enemy, if you're building graphics hardware you know this decades in advance. Plain accelerated BVH traversal only gets us to 1987. There's more than 30 years of good ideas since then to put into hardware

trinibwoy · Aug 29, 2020

Jawed said:
What is the use case of a 2-slot card versus say 3-slots? Are you thinking specifically of a small form factor case that literally constrains a graphics card to 2-slots?

If not, what else?

No practical use, just engineering curiosity. My last 2 cards have been 3-slot AIB behemoths.

Official looking Gainward slides claim 7nm for GA102. Still no word whether that's TSMC or Samsung but I would be extremely surprised if Nvidia gambled on an unproven EUV process.

My bet is TSMC.

Gainward-Ge-Force-RTX-3090-Gainward-Ge-Force-RTX-3080-Phoenix-Golden-Sample-Custom-Graphics-Cards-1.jpg

Jawed · Aug 29, 2020

The power circuitry in this render looks pretty intense:

https://videocardz.com/newz/nvidia-ampere-ga102-rtx-3090-3080-gpu-pictured
https://videocardz.com/newz/nvidia-ampere-ga102-rtx-3090-3080-gpu-pictured
but not bonkers compared with:

https://www.techpowerup.com/review/evga-geforce-rtx-2080-ti-xc-ultra/5.html

CarstenS · Aug 29, 2020

Jawed said:
What is the use case of a 2-slot card versus say 3-slots? Are you thinking specifically of a small form factor case that literally constrains a graphics card to 2-slots?

If not, what else?

Many Mini-ITX cases allow for a 2-wide card, but not much more. These SFF thingies seem to enjoy rising popularity for a couple of years now.

troyan · Aug 29, 2020

Jawed said:
Beyond that, I think we should expect "second generation" real time ray tracing to get a massive boost in performance. For example, there might be large benefits in moving ray queries around the GPU, so that they follow the data, rather than trying to get all the data to all the rays. This is purely my speculation, but I'd like to compare this with how NVidia fully parallelised geometry processing, which was a revolution for tessellation. And again with tile-based rasterisation. And again with render target compression.

Hardware algorithms to speed up well-defined bandwidth-eating monsters are the entire reason we have such nice graphics.

Bandwidth is always the enemy, if you're building graphics hardware you know this decades in advance. Plain accelerated BVH traversal only gets us to 1987. There's more than 30 years of good ideas since then to put into hardware

GA102 has 15,8b more transistors than TU102. Even after reducing the transistors for 17% more compute units, 2rd and 3rd RT/Tensor cores there would be a whole TU (10,8b transistors) unused. So, maybe they are really going all in with Raytracing?!

CarstenS · Aug 29, 2020

trinibwoy said:
No practical use, just engineering curiosity. My last 2 cards have been 3-slot AIB behemoths.

Official looking Gainward slides claim 7nm for GA102. Still no word whether that's TSMC or Samsung but I would be extremely surprised if Nvidia gambled on an unproven EUV process.

My bet is TSMC.

It also says HDMI 2.1 - finally.

Additionally, I wonder what became of the VR-Link thingie with an USB-C outlet. That one added 30 watts of TBP last gen. Maybe that's why the messaging emphasises TGP.

trinibwoy · Aug 29, 2020

troyan said:
GA102 has 15,8b more transistors than TU102. Even after reducing the transistors for 17% more compute units, 2rd and 3rd RT/Tensor cores there would be a whole TU (10,8b transistors) unused. So, maybe they are really going all in with Raytracing?!

Yeah clearly a GA102 SM is much more powerful than a TU102 SM. I would be surprised if tensors are to blame though. 4K DLSS 2.0 only takes ~1.5ms on a 2080 Ti. There's no need for gaming Ampere to go nuts with tensor performance.

Of course it's going to be a combination of things. RT most definitely got an upgrade. Maybe they've doubled ROPs again to help use all that bandwidth.

Jawed · Aug 29, 2020

I'm wondering whether there are "macros" that would run on the tensor ALUs that would greatly contribute to BVH or other acceleration-structure traversal algorithms. e.g. sorting based on prefix-sum running on tensor cores? So it might be worthwhile to add some functionality/memory/data-networks to tensor ALUs for these macros, whilst not making them fully general INT/FLOAT ALUs.

Frenetic Pony · Aug 29, 2020

Alright so, 20 teraflops for 3090, about 16 for the 3080. That 3080 has a rather worryingly low amount of ram for such an expensive card. Then again, they're Nvidia, we could easily see it ramp up to 20gb for an FE model as rumored or after AMD launches their cards in a few months.

PSman1700 · Aug 30, 2020

Damn 20TF’s, didn’t think we would see that this year.

Remij · Aug 30, 2020

20TF with the base boost. It will likely be around ~24TF's at actual tyical boost clocks.

Gonna be a beast.

techuse · Aug 30, 2020

PSman1700 said:
Damn 20TF’s, didn’t think we would see that this year.

Its 40% over a 2080ti. Pretty standard improvement.

CarstenS · Aug 30, 2020

troyan said:
GA102 has 15,8b more transistors than TU102. Even after reducing the transistors for 17% more compute units, 2rd and 3rd RT/Tensor cores there would be a whole TU (10,8b transistors) unused. So, maybe they are really going all in with Raytracing?!

Where does that transistor number come from? Double confirmed?

Frenetic Pony said:
Alright so, 20 teraflops for 3090, about 16 for the 3080. That 3080 has a rather worryingly low amount of ram for such an expensive card. Then again, they're Nvidia, we could easily see it ramp up to 20gb for an FE model as rumored or after AMD launches their cards in a few months.

3090 with 5248 ALUs @1725 MHz (for the Gainward thingie) is more like 18.1 TFLOPS.

Frenetic Pony · Aug 30, 2020

CarstenS said:
Where does that transistor number come from? Double confirmed?

3090 with 5248 ALUs @1725 MHz (for the Gainward thingie) is more like 18.1 TFLOPS.

Just assuming they're using all the bandwidth for their highest end card, especially since the memory is clocked just that much higher than the lower one. End is result is just under twenty two teraflops max (edit, off due to dropping a small number from 2080, derp), though with the huge 3 slot cooler and giant TPD maybe max clock speed can't be maintained that long.

Remij · Aug 30, 2020

I wonder how high the 3090/80 will ultimately boost to?

The 2080ti has advertised base boost at 1545mhz, but we all know they're typically at ~1800-1900mhz. Pretty easy to hit 2000mhz on my FE anyway.

I wonder if the 30 series will boost higher? Not much longer to wait I guess anyway lol.

Based on TF's there's around 40% between the 2080ti and the 3090.. but who's to say that architectural improvements wont push actual performance even further still?

Scott_Arm · Aug 30, 2020

All I know is, if I get a 3080, I'm under-volting the hell out of it. If I can get it down to 825 mV like my current card while keeping the stock boost clock during 100% load, it'll save a shitload of power.

Nvidia Ampere Discussion [2020-05-14]

trinibwoy

Meh

Scott_Arm

CarstenS

Moderator

DavidGraham

Jawed

trinibwoy

Meh

Jawed

CarstenS

Moderator

troyan

CarstenS

Moderator

trinibwoy

Meh

Jawed

Frenetic Pony

PSman1700

Remij

techuse

CarstenS

Moderator

Frenetic Pony

Remij

Scott_Arm

Similar threads