NVidia Ada Speculation, Rumours and Discussion

DegustatoR · Feb 25, 2022

There may not be a "4090". We're missing the Titan in 30 series.

DegustatoR · Feb 28, 2022

NVIDIA Ada AD102-107, Hopper GH100/202, Blackwell GB100/102 GPU confirmed?

It's not "Ada", it's Lovelace.

CarstenS · Mar 1, 2022

DegustatoR said:
NVIDIA Ada AD102-107, Hopper GH100/202, Blackwell GB100/102 GPU confirmed?

It's not "Ada", it's Lovelace.

Lovelace in shorthand is lol? Or Lola?

Davros · Mar 1, 2022

It's not "Ada", it's Linda

ps: do you think this card gives any hint to the efficiency of lovelace
tldr its on tsmc 7nm as opposed to the 3080 on samsung 8nm amd its a hell of a lot more efficient and lovelace is rumoured to be on tsmc 5nm

Cyan · Mar 3, 2022

DegustatoR said:
NVIDIA Ada AD102-107, Hopper GH100/202, Blackwell GB100/102 GPU confirmed?

It's not "Ada", it's Lovelace.

some more interesting tidbits..., these things are fast

Leak Suggests 'RTX 4090' Could Have 75% More Cores Than RTX 3090 | Tom's Hardware (tomshardware.com)

Ada pushes the envelope, with TSMC 5nm and a big jump in core counts

A leaker by the name of @davideneco25320 on Twitter has shared some very specific details about Nvidia's next-generation Ada (aka Lovelace) GPUs including SM counts and names of each new die. If his data is accurate (and given the recent Nvidia hack, it very well could be), Ada will be a massive upgrade over Ampere, the RTX 30-series, especially for the flagship GPU. As this is leaked data and cannot be completely trusted, take these results with a grain of salt.

https://twitter.com/x/status/1498735616346972164

The leak shows that Nvidia will not be changing its nomenclature for the Ada generation, keeping the two letter prefix and three digit number system as the Ampere generation. AD102 denotes the flagship GPU, likely for an RTX 3090 or Titan-class card, with AD103 following as the next most powerful die (perhaps for a potential RTX 4080). AD104-106 will follow suit being the midrange dies (i.e. RTX 4070 and RTX 4060) and AD107 will fill out the entry-level market for Nvidia's Ada GPUs (i.e. something like an RTX 4050).

Note also that the codenames suggest Nvidia will be using the Ada codename and not the previously rumored Lovelace codename, so that's how we'll refer to the future GPUs for now.

One thing that has changed significantly is the number of SMs in Ada. The flagship AD102 die will supposedly tip the scales with a whopping 144 SMs in a single die. By way of comparison, Ampere's GA102 only has 84 SMs, so this is a 71% increase in SM count, which should likewise apply to GPU cores, RT cores, TMUs, and other elements. This will be one of the largest jumps we've ever seen in a single generation.

If Nvidia keeps the number of CUDA cores the same on Ada, this means we could be looking at 18,432 CUDA cores for the flagship card. Nvidia's upcoming RTX 3090 TI 'only' has 10,752 CUDA cores, using the full GA102 chip. Of course we'll also see lesser variants that use partially harvested AD102 chips, and while 144 SMs may be the maximum, we wouldn't be surprised to see 10–20% of the SMs disabled for some graphics card models.

The number of SMs in the other chips isn't nearly as high, though the numbers are still very respectable. AD103 will supposedly have the same 84 SMs as GA102 with 84 SMs, a 40% jump from GA103. AD104 will follow suit, with the same 60 SMs as GA103, or 25% more SMs than GA104. AD106 is a bit closer to GA106, with 36 SMs — a 20% uplift. Finally, AD107 will supposedly feature just 24 SMs, again the same respectable 20% jump in SM count compared to GA107.

If these leaks and rumors prove accurate, we can expect flagship cards like a future RTX 4090 and RTX 4080 to pack some incredible performance improvements over the current RTX 30-series. It's certainly a larger jump than Ampere compared to Turing, at least in some respects. RTX 3080 for example had 68 SMs compared to RTX 2080 Ti's 68 SMs, though there were plenty of other changes.

Power consumption could also be increased for Ada GPUs with the addition of the new 16-pin power connectors that are being developed and produced right now for future PCIe 5.0 graphics cards. Featuring a maximum power output of 600W from a single plug, that would give Nvidia a ton of headroom to boost performance on Ada GPUs.

Ada may also be the first PCIe 5.0 compliant graphics solution, and while the increase in PCIe bandwidth might not matter too much, it certainly won't hurt performance. What we don't know is how much Nvidia plans to change the fundamental building blocks in Ada. For example, Turing had 64 FP32 cores and 64 INT32 cores per SM, which were able to run concurrently on different data. Ampere altered things so that the INT32 cores became INT32 or FP32 cores, potentially doubling the FP32 performance.

Ampere also features 3rd generation Tensor cores and 2nd generation RT cores for ray tracing. Ada will likely use 4th generation Tensor cores and 3rd generation RT cores. What will that mean? We don't have exact details, but Ada will almost certainly deliver far more performance than the current Ampere GPUs. There might be more CUDA, Tensor, and/or RT cores per SM, or the internal pipelines may simply be revamped to improve throughput.

Memory is also another big player when it comes to GPU performance, and could play an even bigger role in improving frame rates considering how many SMs Ada may have. GDDR6+ and GDDR7 are already on Samsung's roadmap featuring substantial bandwidth improvements over GDDR6X, and Nvidia will likely use one or both of these new standards if they're ready in time for Ada production. After all, the more cores you have, the more memory bandwidth you need to feed them all.

Generally speaking, Nvidia has improved performance on its fastest GPUs by around 30% with previous architectures, but with the change in process node and massively increased core counts, plus a potentially higher power limit, it's not unrealistic to expect even bigger improvements from Ada.

Will the RTX 4090 (or whatever it ends up being called) end up delivering twice the performance of the RTX 3090? That's ambitious but certainly not out of reach. 75% more cores with higher clockspeeds and/or a more efficient architecture would do the trick. We'll find out more later this year, as Ada is expected to launch in the September timeframe.

xpea · Mar 3, 2022

More news from the leaks, Ada features a massive L2 cache (16x Ampere)
https://videocardz.com/newz/geforce...ry-large-l2-caches-nvidias-own-infinity-cache

This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

Finally, early AD102 samples are boosting at 2.4-2.5Ghz, which is another performance boost of Ada vs Ampere

Kaotik · Mar 3, 2022

xpea said:
More news from the leaks, Ada features a massive L2 cache (16x Ampere)
https://videocardz.com/newz/geforce...ry-large-l2-caches-nvidias-own-infinity-cache

View attachment 6315

This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

Finally, early AD102 samples are boosting at 2.4-2.5Ghz, which is another performance boost of Ada vs Ampere

I love the way you can tell something in unreleased architecture is advantage over something else in another unreleased architecture

no-X · Mar 3, 2022

xpea said:
This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

L3 cache of RDNA 2 has lower latency than L2 cache of Ampere:

https://chipsandcheese.com/2021/04/16/measuring-gpu-memory-latency/

TopSpoiler · Mar 3, 2022

no-X said:
L3 cache of RDNA 2 has lower latency than L2 cache of Ampere:

https://chipsandcheese.com/2021/04/16/measuring-gpu-memory-latency/

That result is not valid since Chips&Cheese updated their test methodology and results.
See https://chipsandcheese.com/2021/05/13/gpu-memory-latencys-impact-and-updated-test/

no-X · Mar 3, 2022

TopSpoiler said:
That result is not valid since Chips&Cheese updated their test methodology and results.
See https://chipsandcheese.com/2021/05/13/gpu-memory-latencys-impact-and-updated-test/

Even the revisited results prove, that L3 cache of RDNA 2 has lower latency than L2 cache of Ampere:

CarstenS · Mar 3, 2022

Different architectures require different caching. RDNA2 is clocking significantly higher than Ampere (~+25%) and the vector SIMDs are fed every clock, instead of every other clock. There's effects of smaller caches' hit rates and ability to saturate consumers of data. And then there's return on investment and power implications for faster caches.

I thought I just mention the obvious here again for people casually scrolling through all of this.

Yes, it's an interesting armchair exercise, but IMHO nothing which would allow conlusions reaching as far as the next generation of chips with differences in architecture.

Leoneazzurro5 · Mar 3, 2022

no-X said:
Even the revisited results prove, that L3 cache of RDNA 2 has lower latency than L2 cache of Ampere:

In this last test IC is 15ns slower, not faster, than Ampere's L2. But all that is pointless in comparing future architectures as we have no details on RDNA3 l1 and L2 (which are likely to be enlarged), as well as about latency of such a big L2 on Ada (latency tends to increase with cache size). On the other side, we have seen with Milan-X that a large stacked cache "AMD's way" does not add much cycles to latency. So... let's wait and see.

trinibwoy · Mar 3, 2022

xpea said:
More news from the leaks, Ada features a massive L2 cache (16x Ampere)
https://videocardz.com/newz/geforce...ry-large-l2-caches-nvidias-own-infinity-cache

View attachment 6315

This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

Finally, early AD102 samples are boosting at 2.4-2.5Ghz, which is another performance boost of Ada vs Ampere

That’s great. Should be a win for RT.

I wonder how useful these big GPU caches will be in general though. Consoles don’t have them and game engines seem to be moving toward streaming of unique assets versus the tiling and instancing approaches of old.

Rootax · Mar 3, 2022

Even without spécial engine optimisations, I guess hitting L2/L3 is still better than the vram.

Cyan · Mar 23, 2022

let's take it as some kind of preview

Nvidia Reveals Hopper H100 GPU With 80 Billion Transistors | Tom's Hardware (tomshardware.com)

Seanspeed · Mar 26, 2022

Cyan said:
let's take it as some kind of preview

Nvidia Reveals Hopper H100 GPU With 80 Billion Transistors | Tom's Hardware (tomshardware.com)

I'd expect very little about this processor to crossover with Lovelace. They've even gone back to naming them completely differently this time(unlike Ampere, which also had little in common with gaming Ampere).

Kaotik · Mar 26, 2022

Seanspeed said:
I'd expect very little about this processor to crossover with Lovelace. They've even gone back to naming them completely differently this time(unlike Ampere, which also had little in common with gaming Ampere).

It's not like Volta and Turing were world's apart because of the different names

CarstenS · Mar 27, 2022

They were, in fact, a calender year apart.

Pascal P100/ GP10x 2016
Volta 2017
Turing 2018
Ampere A100/GA10x 2020
Hopper 2022
Ada Lovelace [...]

So either
a) Nvidia is deviating from this pattern.
b) There will be gaming Hoppers later this year
c) Lovelace will be a 2023 product
d) This is coincidence and not a pattern after all.

DegustatoR · Mar 27, 2022

AFAIK there were supposed to be a gaming Volta lineup but it was scrapped in favor of Turing eventually.
Lovelace seem to be more different from Hopper than either of the previous HPC only architectures were so this may be the reason for a different name.

Jawed · Mar 27, 2022

DegustatoR said:
Lovelace seem to be more different from Hopper than either of the previous HPC only architectures were so

Why do you conclude that the difference is greater here? Is there a specific feature of the tensor cores or cache that you're referring to or is there something else?

NVidia Ada Speculation, Rumours and Discussion

DegustatoR

DegustatoR

CarstenS

Moderator

Davros

Cyan

orange

xpea

Kaotik

Drunk Member

no-X

TopSpoiler

no-X

CarstenS

Moderator

Leoneazzurro5

trinibwoy

Meh

Rootax

Cyan

orange

Seanspeed

Kaotik

Drunk Member

CarstenS

Moderator

DegustatoR

Jawed

Similar threads