NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
Nice analysis. They estimated a small 10% increase in SM area. That doesn’t seem good enough for a significant boost in RT performance. I hope it’s higher. Compute capability of 8.9 does imply overall SM architecture hasn’t changed much vs Ampere.
 
Nice analysis. They estimated a small 10% increase in SM area. That doesn’t seem good enough for a significant boost in RT performance. I hope it’s higher. Compute capability of 8.9 does imply overall SM architecture hasn’t changed much vs Ampere.

I would believe otherwise. 10% SM size increase would be massive, if most of it belongs to RT improvements. RT units are just a small part of the SM.
 
If all this is correct, I hope the generation after Ada brings more efficiency, since power seems to scale linearly with performance. Having the AD104 behaving like a 3090 and consuming the same performance of the latter sounds terrible. I don't plan to change my power supply just to have 2x performance. Horrific.
 
Raster performance has much more implications further down the product stack. Unless there's some major departure in approach than performance scaling in general should be fairly uniform across workload types. The gap with GD102 over 104 is significantly larger than that of GA102 over 104 on paper. Based on the numbers GD102 should be in the order of 2x in performance over GD104 across a broad set of workloads, whether that be raster or ray trace gaming. The issue here is then if GD102 is not in the 2x range (I think we should also note that the real numbers at this point should be take as rough) and only say 1.5x range than that has implications of how fast GD104 is relative to GA102 and 104. So while GD102 doesn't "need" that 2x gain over GA102 in raster, GD104 does "need" that 1.5x gain in raster over GA104.

Based on the current information the configuration of each GPC between GA102 and GD102 is the same. This means there is a 1.7x increase and about a 1.2x clock speed increase (ball parking current numbers) which together works out to about 2.05x from a high level perspective, hence that 2x increase number. On a simplistic level the question is then can the memory sub system feed that adequately? Bandwidth based on 24Gbps GDDR6X announcements would be about 1.25x or 1.15x depending on what you want to compare to. It's worth noting that if we compare Navi 21 against Navi 10 there's a ~2.35x increase fed by a 1.15x increase in bandwidth + cache for I guess real gains in raster in the 2x range.
 
Last edited:
So while GD102 doesn't "need" that 2x gain over GA102 in raster, GD104 does "need" that 1.5x gain in raster over GA104.

By raster do you literally mean rasterizing triangles? What games today are even close to raster limited on a 3070 Ti? Based on the leaked specs AD104 actually has one less rasterizer than GA104. It’s probably a wash given higher clocks on Lovelace.

Shading performance benefits RT just as much as raster or maybe even more so given lower efficiency due to divergence. That’s the thing to keep an eye on.
 
By raster do you literally mean rasterizing triangles? What games today are even close to raster limited on a 3070 Ti? Based on the leaked specs AD104 actually has one less rasterizer than GA104. It’s probably a wash given higher clocks on Lovelace.

Shading performance benefits RT just as much as raster or maybe even more so given lower efficiency due to divergence. That’s the thing to keep an eye on.

In the context of conversation in general I'm using it in terms of (and I think everyone else is?) simply dividing gaming performance into non-ray tracing/or "raster" and ray tracing.

At least that's what I thought the context of the discussion was, whether or not GD102 would have 100% improvements in gaming tests without ray tracing and >100% in gaming tests with ray tracing over GA102.
 
In the context of conversation in general I'm using it in terms of (and I think everyone else is?) simply dividing gaming performance into non-ray tracing/or "raster" and ray tracing.

At least that's what I thought the context of the discussion was, whether or not GD102 would have 100% improvements in gaming tests without ray tracing and >100% in gaming tests with ray tracing over GA102.

Got it. If we assume that Ampere is severely memory starved then there could be a lot of upside to the expanded L2 cache even if the raw flops and bandwidth numbers don’t increase as much. The 6900 XT is ~30% faster than the 3070 Ti in rasterization at 4K and it has 20% less bandwidth. Infinity cache is helping a lot.
 
If everything else scales accordingly it should be doubled 18432 vs 10752, 2.25GHz vs 1.90GHz which is about 2x, but doubt it, hopefully RT performance does though.
Ampere certainly didn't scale accordingly. 3090 only 13% faster than 3080 with 20% more cores and 23% more bandwidth. Scaling is only likely to get worse at even higher core counts.
 
Last edited:
Ampere certainly didn't scale accordingly. 3090 only 13% faster than 3080 with 20% more cores and 23% more bandwidth. Scaling is only likely to get worse at even higher core counts.
Is that with or without RT?

RT scaling is more important, in the end.

Additionally, if the rumours about larger L2 are real then it could mean that the architecture is re-balanced somewhat to take advantage of the L2 boost.
 
Is that with or without RT?

RT scaling is more important, in the end.

Additionally, if the rumours about larger L2 are real then it could mean that the architecture is re-balanced somewhat to take advantage of the L2 boost.

It’s a similar level of scaling with or without RT if you sample a large number of games. Maybe 1 or 2% higher with RT.
 
Ampere certainly didn't scale accordingly. 3090 only 13% faster than 3080 with 20% more cores and 23% more bandwidth. Scaling is only likely to get worse at even higher core counts.

Are you comparing the 3090 FE against the 3080 FE? We need to be mindful here because that isn't strictly the same as comparing GD102-300 (RTX 3090) against GD102-200 (RTX 3080) and how well those extra 20% cores and etc. scales.

The RTX 3090 FE only has a 10% higher power limit than the 3080 FE (350w vs 320w). In practice this mans there is effectively claw back as the 3090 FE ends up clocked lower. Which results in it being closer to 10% faster as opposed to the 20% faster you expect in theory given the hardware differences.

TPU unfortunately doesn't review the 3090 FE but if we look at the Zotac model they reviewed which uses the same 350w TDP and stock boost table - https://www.techpowerup.com/review/zotac-geforce-rtx-3090-trinity/30.html

You can see it has noticeably lower clocks than the RTX 3080 FE - https://www.techpowerup.com/review/nvidia-geforce-rtx-3080-founders-edition/32.html

Whereas if you use this Asus model - https://www.techpowerup.com/review/asus-geforce-rtx-3090-strix-oc/30.html you see the clock speeds end up in line to attempt to normalize more so for clocks you see that performance is closer to the 20% range you expect given the hardware differences - https://tpucdn.com/review/asus-geforce-rtx-3090-strix-oc/images/relative-performance_3840-2160.png

I think in the context of Ada versus Ampere we also need to be mindful of the above. GD102 possibly being potentially 2x faster than GA102 is not the same as RTX 4090 being 2x faster than RTX 3090. Or GD104 being 1.5x faster than GA104 isn't the as RTX 4070 being 1.5x faster than RTX 3070.
 
Nvidia Uses GPU-Powered AI to Design Its Newest GPUs | Tom's Hardware (tomshardware.com)
Nvidia's chief scientist recently talked about how his R&D teams are using GPUs to accelerate and improve the design of new GPUs. Four complex and traditionally slow processes have already been tuned by leveraging machine learning (ML) and artificial intelligence (AI) techniques. In one example, using AI/ML accelerated inference can speed a common iterative GPU design task from three hours to three seconds.
...
In his talk, Dally outlined four significant areas of GPU design where AI/ML can be leveraged to great effect: mapping voltage drop, predicting parasitics, place and routing challenges, and automating standard cell migration. Let's have a look at each process, and how AI tools are helping Nvidia R&D get on with the brain work instead of waiting around for computers to do their thing.
 
Status
Not open for further replies.
Back
Top