NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
It's not "Ada", it's Linda ;)
ps: do you think this card gives any hint to the efficiency of lovelace
tldr its on tsmc 7nm as opposed to the 3080 on samsung 8nm amd its a hell of a lot more efficient and lovelace is rumoured to be on tsmc 5nm
 
some more interesting tidbits..., these things are fast

Leak Suggests 'RTX 4090' Could Have 75% More Cores Than RTX 3090 | Tom's Hardware (tomshardware.com)

Ada pushes the envelope, with TSMC 5nm and a big jump in core counts

A leaker by the name of @davideneco25320 on Twitter has shared some very specific details about Nvidia's next-generation Ada (aka Lovelace) GPUs including SM counts and names of each new die. If his data is accurate (and given the recent Nvidia hack, it very well could be), Ada will be a massive upgrade over Ampere, the RTX 30-series, especially for the flagship GPU. As this is leaked data and cannot be completely trusted, take these results with a grain of salt.


The leak shows that Nvidia will not be changing its nomenclature for the Ada generation, keeping the two letter prefix and three digit number system as the Ampere generation. AD102 denotes the flagship GPU, likely for an RTX 3090 or Titan-class card, with AD103 following as the next most powerful die (perhaps for a potential RTX 4080). AD104-106 will follow suit being the midrange dies (i.e. RTX 4070 and RTX 4060) and AD107 will fill out the entry-level market for Nvidia's Ada GPUs (i.e. something like an RTX 4050).

Note also that the codenames suggest Nvidia will be using the Ada codename and not the previously rumored Lovelace codename, so that's how we'll refer to the future GPUs for now.

One thing that has changed significantly is the number of SMs in Ada. The flagship AD102 die will supposedly tip the scales with a whopping 144 SMs in a single die. By way of comparison, Ampere's GA102 only has 84 SMs, so this is a 71% increase in SM count, which should likewise apply to GPU cores, RT cores, TMUs, and other elements. This will be one of the largest jumps we've ever seen in a single generation.

If Nvidia keeps the number of CUDA cores the same on Ada, this means we could be looking at 18,432 CUDA cores for the flagship card. Nvidia's upcoming RTX 3090 TI 'only' has 10,752 CUDA cores, using the full GA102 chip. Of course we'll also see lesser variants that use partially harvested AD102 chips, and while 144 SMs may be the maximum, we wouldn't be surprised to see 10–20% of the SMs disabled for some graphics card models.

The number of SMs in the other chips isn't nearly as high, though the numbers are still very respectable. AD103 will supposedly have the same 84 SMs as GA102 with 84 SMs, a 40% jump from GA103. AD104 will follow suit, with the same 60 SMs as GA103, or 25% more SMs than GA104. AD106 is a bit closer to GA106, with 36 SMs — a 20% uplift. Finally, AD107 will supposedly feature just 24 SMs, again the same respectable 20% jump in SM count compared to GA107.

If these leaks and rumors prove accurate, we can expect flagship cards like a future RTX 4090 and RTX 4080 to pack some incredible performance improvements over the current RTX 30-series. It's certainly a larger jump than Ampere compared to Turing, at least in some respects. RTX 3080 for example had 68 SMs compared to RTX 2080 Ti's 68 SMs, though there were plenty of other changes.

Power consumption could also be increased for Ada GPUs with the addition of the new 16-pin power connectors that are being developed and produced right now for future PCIe 5.0 graphics cards. Featuring a maximum power output of 600W from a single plug, that would give Nvidia a ton of headroom to boost performance on Ada GPUs.

Ada may also be the first PCIe 5.0 compliant graphics solution, and while the increase in PCIe bandwidth might not matter too much, it certainly won't hurt performance. What we don't know is how much Nvidia plans to change the fundamental building blocks in Ada. For example, Turing had 64 FP32 cores and 64 INT32 cores per SM, which were able to run concurrently on different data. Ampere altered things so that the INT32 cores became INT32 or FP32 cores, potentially doubling the FP32 performance.

Ampere also features 3rd generation Tensor cores and 2nd generation RT cores for ray tracing. Ada will likely use 4th generation Tensor cores and 3rd generation RT cores. What will that mean? We don't have exact details, but Ada will almost certainly deliver far more performance than the current Ampere GPUs. There might be more CUDA, Tensor, and/or RT cores per SM, or the internal pipelines may simply be revamped to improve throughput.

Memory is also another big player when it comes to GPU performance, and could play an even bigger role in improving frame rates considering how many SMs Ada may have. GDDR6+ and GDDR7 are already on Samsung's roadmap featuring substantial bandwidth improvements over GDDR6X, and Nvidia will likely use one or both of these new standards if they're ready in time for Ada production. After all, the more cores you have, the more memory bandwidth you need to feed them all.

Generally speaking, Nvidia has improved performance on its fastest GPUs by around 30% with previous architectures, but with the change in process node and massively increased core counts, plus a potentially higher power limit, it's not unrealistic to expect even bigger improvements from Ada.

Will the RTX 4090 (or whatever it ends up being called) end up delivering twice the performance of the RTX 3090? That's ambitious but certainly not out of reach. 75% more cores with higher clockspeeds and/or a more efficient architecture would do the trick. We'll find out more later this year, as Ada is expected to launch in the September timeframe.
 
More news from the leaks, Ada features a massive L2 cache (16x Ampere)
https://videocardz.com/newz/geforce...ry-large-l2-caches-nvidias-own-infinity-cache

Ampere vs Ada config.png

This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

Finally, early AD102 samples are boosting at 2.4-2.5Ghz, which is another performance boost of Ada vs Ampere
 
More news from the leaks, Ada features a massive L2 cache (16x Ampere)
https://videocardz.com/newz/geforce...ry-large-l2-caches-nvidias-own-infinity-cache

View attachment 6315

This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

Finally, early AD102 samples are boosting at 2.4-2.5Ghz, which is another performance boost of Ada vs Ampere
I love the way you can tell something in unreleased architecture is advantage over something else in another unreleased architecture
 
This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)
L3 cache of RDNA 2 has lower latency than L2 cache of Ampere:

WBnvOptZci5rsAQm.jpg

https://chipsandcheese.com/2021/04/16/measuring-gpu-memory-latency/
 
Different architectures require different caching. RDNA2 is clocking significantly higher than Ampere (~+25%) and the vector SIMDs are fed every clock, instead of every other clock. There's effects of smaller caches' hit rates and ability to saturate consumers of data. And then there's return on investment and power implications for faster caches.

I thought I just mention the obvious here again for people casually scrolling through all of this.

Yes, it's an interesting armchair exercise, but IMHO nothing which would allow conlusions reaching as far as the next generation of chips with differences in architecture.
 
Even the revisited results prove, that L3 cache of RDNA 2 has lower latency than L2 cache of Ampere:
image-9kek7a.png

In this last test IC is 15ns slower, not faster, than Ampere's L2. But all that is pointless in comparing future architectures as we have no details on RDNA3 l1 and L2 (which are likely to be enlarged), as well as about latency of such a big L2 on Ada (latency tends to increase with cache size). On the other side, we have seen with Milan-X that a large stacked cache "AMD's way" does not add much cycles to latency. So... let's wait and see.
 
Last edited:
More news from the leaks, Ada features a massive L2 cache (16x Ampere)
https://videocardz.com/newz/geforce...ry-large-l2-caches-nvidias-own-infinity-cache

View attachment 6315

This large L2 cache is one factor of the big Ada RT performance boost (but not only). Current gen games BHV trees can fit in L2 where it's directly accessible at ultra low latency to the RT cores (big advantage vs L3 cache on RDNA3)

Finally, early AD102 samples are boosting at 2.4-2.5Ghz, which is another performance boost of Ada vs Ampere

That’s great. Should be a win for RT.

I wonder how useful these big GPU caches will be in general though. Consoles don’t have them and game engines seem to be moving toward streaming of unique assets versus the tiling and instancing approaches of old.
 
I'd expect very little about this processor to crossover with Lovelace. They've even gone back to naming them completely differently this time(unlike Ampere, which also had little in common with gaming Ampere).
It's not like Volta and Turing were world's apart because of the different names
 
They were, in fact, a calender year apart.

Pascal P100/ GP10x 2016
Volta 2017
Turing 2018
Ampere A100/GA10x 2020
Hopper 2022
Ada Lovelace [...]

So either
a) Nvidia is deviating from this pattern.
b) There will be gaming Hoppers later this year
c) Lovelace will be a 2023 product
d) This is coincidence and not a pattern after all.
 
AFAIK there were supposed to be a gaming Volta lineup but it was scrapped in favor of Turing eventually.
Lovelace seem to be more different from Hopper than either of the previous HPC only architectures were so this may be the reason for a different name.
 
Lovelace seem to be more different from Hopper than either of the previous HPC only architectures were so
Why do you conclude that the difference is greater here? Is there a specific feature of the tensor cores or cache that you're referring to or is there something else?
 
Status
Not open for further replies.
Back
Top