NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
These memory interface numbers, of course, could be due to that it's difficult to get more memory bandwidth for now, but the regression on 4080 and 4070 compared to 3080 and 3070 seem to suggest either 3080 and 3070 aren't using these memory bandwidth efficiently, or 4080 and 4070 are better at memory bandwidth efficiency (for example, with larger cache).
A significant increase in L2 size is all but confirmed for Lovelace at this point.
Interestingly though it looks like it will be smaller than IC of RNDA2 while Lovelace being considerably faster than the former.
So I'd expect some additional bandwidth magic happening here as otherwise it seems insufficient at such bus widths.

I think "2 points" implies the 4090 would have multiple power limits depending on the power cable connected.
It's just the usual Nvidia voltage/frequency curve most likely.
 
It's just the usual Nvidia voltage/frequency curve most likely.
Correct.
Another small information. It looks like Nvidia struggles with ADA103 marketing positioning. The performance delta with AD102 at same voltage / power curve is way too much. AD102 is basically ~50% faster and it seems that it creates a big hole in Ada product range vs AMD. So AD103 clocks and TGP are currently pushed to the moon (>3.1Ghz and 400W). I don't know anything about RDNA3 so I can't say more but it will be interesting to keep an eye on this category battle.
Oh and it's funny to see OEMs changing their mind about RDNA3 vs Ada. AMD started to give information first and the consensus was AMD will get an easy win. Then last weeks, Nvidia started to open up a bit to prepare for launch and now situation is very different. And surprise... leakers are following the trend of a much closer battle than anticipated
 
It's a bit interesting to me that Nvidia would have chosen such a large hardware separation between AD102 and AD103 in the first place given the above. Were they expecting much worse defect and parametric yields for AD102 requiring a much larger cut? As otherwise on paper AD102 has >70% more hardware resources than AD103, and the large gap would have been something they would've already anticipated very early on. Or was AD103 designed in mind to the limit of they felt would fit for laptops? As on paper a 8 GPC design for AD103 seems like it would have fit better in terms of the stack.

Personally the upcoming rumored product stack is a bit problematic to me as I'm not strictly interested in raw performance. I was hoping for 12GB of VRAM (or higher) to be available in sub RTX 3080ti price and power categories (closer to 200w if not lower) and at higher perf than RTX 3060 (at least 3060ti or higher). Even throwing out price it doesn't seem like this will be available from the power stand point even with reasonable undervolting. So I'm still in the same boat as this gen, the RTX 3060 is too slow, the next bulk of the segment too low VRAM, and the 12GB+ ones way too high up the stack in terms of power (and price).

For those that just care about raw performance the latest rumors on clock speeds due paint the performance gain situation in a much more clear light (as the previous rumored 20% clock gains over Ampere would have required significant perf per SM gains as well) as well as explaining those high power figures relative to the SM gains vs Ampere.
 
Personally the upcoming rumored product stack is a bit problematic to me as I'm not strictly interested in raw performance. I was hoping for 12GB of VRAM (or higher) to be available in sub RTX 3080ti price and power categories (closer to 200w if not lower) and at higher perf than RTX 3060 (at least 3060ti or higher). Even throwing out price it doesn't seem like this will be available from the power stand point even with reasonable undervolting. So I'm still in the same boat as this gen, the RTX 3060 is too slow, the next bulk of the segment too low VRAM, and the 12GB+ ones way too high up the stack in terms of power (and price).
The rumored 4070 is a 160-bit bus, which means it's almost assuredly a deactivated memory controller on the chip. So a 4070Ti or whatever with 12GB is almost guaranteed at some point. As for power, since you dont care about performance as much, just turn the clocks down, turn the power limit down and if you really want to minimize power draw, undervolt it.

Heck, I'm pretty sure you could get a 4080 to sub-300w if you cared to, while still having a pretty sizeable performance leap over a 3080.
 
Correct.
Another small information. It looks like Nvidia struggles with ADA103 marketing positioning. The performance delta with AD102 at same voltage / power curve is way too much. AD102 is basically ~50% faster and it seems that it creates a big hole in Ada product range vs AMD. So AD103 clocks and TGP are currently pushed to the moon (>3.1Ghz and 400W). I don't know anything about RDNA3 so I can't say more but it will be interesting to keep an eye on this category battle.
Oh and it's funny to see OEMs changing their mind about RDNA3 vs Ada. AMD started to give information first and the consensus was AMD will get an easy win. Then last weeks, Nvidia started to open up a bit to prepare for launch and now situation is very different. And surprise... leakers are following the trend of a much closer battle than anticipated
I mean isn't it pretty much the usual product segmentation? The next to top chip is always ~67% of the biggest one.
The only possibly new scenario here would be if the top end chip manages to clock higher than what was expected which would make the gap bigger than usual.
But generally speaking yeah a 420W 4080 will look weird next to a 450W 4090 which will likely end up beating it by 50%.
Buuuuut this leaves the door open for a proper 4080Ti this time around, based on AD102, likely with 20GBs of VRAM or so.
 
The rumored 4070 is a 160-bit bus, which means it's almost assuredly a deactivated memory controller on the chip. So a 4070Ti or whatever with 12GB is almost guaranteed at some point. As for power, since you dont care about performance as much, just turn the clocks down, turn the power limit down and if you really want to minimize power draw, undervolt it.

Heck, I'm pretty sure you could get a 4080 to sub-300w if you cared to, while still having a pretty sizeable performance leap over a 3080.

Yes but the 4070 is rumored with a 300w TDP with that cutdown GA104, this would imply that a further enabled GA104, possibly a 4070ti, would have even higher than 300w TDP. Since I'm looking for around 200w if not lower that is a pretty hefty cut down with likely a pretty significant performance loss. Much less trying to pair down hypothetically AD103 (RTX 4080) to a 200w level or even a cut down AD103.

It's not that I don't care about performance at all, as I do say a RTX 3060 is to slow this gen, just that performance gains only isn't my only concern, at least not to the levels being speculated right now. RTX 4070 being possibly 3090 equivalent perf is very good already, I'd actually be willing to sacrifice off that.

But I kind of suspected this with where things were going in terms of VRAM stagnation for each "tier" point if not regression. From a business stand point this also provides an interesting way for Nvidia (and AMD) to essentially market segment off higher VRAM to higher tiers. Even if you buy a future gen card of a lower tier, you'd still end up VRAM limited even if the perf improves.

I mean isn't it pretty much the usual product segmentation? The next to top chip is always ~67% of the biggest one.
The only possibly new scenario here would be if the top end chip manages to clock higher than what was expected which would make the gap bigger than usual.
But generally speaking yeah a 420W 4080 will look weird next to a 450W 4090 which will likely end up beating it by 50%.
Buuuuut this leaves the door open for a proper 4080Ti this time around, based on AD102, likely with 20GBs of VRAM or so.

Ampere had a slightly larger gap between GA102 and GA104 with 75% more SMs. But there was always some mumblings I believe that Nvidia was uncomfortable with how much they had to cut down GA102 to make a RTX 3080 configuration and/or that it was related to yield issues specifically as well.

GA103 also does exist in between and would only result in a 40% peak SM count advantage for GA102.

Turing, Pascal, and Maxwell had only a 50% differential between x02 and x04 in terms of peak SM count.

You'd need to go back to Kepler with with GK110 being 87.5% over GK104. Which interestingly you'd argue that with Kepler and GK110 it was intended to be essentially moved up stack (like with AD102) which eventually led to a x02 being inserted. As well as it not strictly being for PC gaming.

Fermi was 50%.

In that sense the gap between AD102 and the next down in AD103 is much larger at least as far back as Maxwell.
 
Fermi was 50%.
Turing, Pascal, and Maxwell had only a 50% differential between x02 and x04 in terms of peak SM count.
I'm not sure how you're getting these %s but if it's x02+50% then it's pretty much the same as far back as Fermi with the only exception being Kepler.
Lovelace isn't that much different with 4080 being 10240 and 4090 being 16384.
At least when looking at actual products - which is what you should do since there's little point in looking at chips maximum configs if they aren't available as products.
 
I'm not sure how you're getting these %s but if it's x02+50% then it's pretty much the same as far back as Fermi with the only exception being Kepler.
Lovelace isn't that much different with 4080 being 10240 and 4090 being 16384.
At least when looking at actual products - which is what you should do since there's little point in looking at chips maximum configs if they aren't available as products.

Because isn't Xpea referring to the large gap between the AD102 and AD103 causing problems with how to configure and place a AD103 product in market? At least my reading with this is that AD102's "optimal" configurations don't require cutting it down or clocking it compared to say Nvidia did with GA102 and the RTX 3080.

RTX 3080 and RTX 3070 have a smaller gap (40%) simply because they cut much more off GA102. A 116 SM AD102 (equivalent percentage) would also result in a smaller gap.

In an extreme example otherwise we'd able say that GA104 and GA106 have no difference, since some RTX 3060's actually use cut down GA104 chips.

At least my reading is that it seems like there is a complication in terms of how they want to set the product stack due to the large differential in the actual designs. Either they cut AD102 down more than they want (ala RTX 3080) or they need to try to clock up AD103 more than "optimal." Or there ends up with a very large gap in between the cut down AD102 config and the highest AD103 config.
 
Because isn't Xpea referring to the large gap between the AD102 and AD103 causing problems with how to configure and place a AD103 product in market? At least my reading with this is that AD102's "optimal" configurations don't require cutting it down or clocking it compared to say Nvidia did with GA102 and the RTX 3080.
Well, yeah, the point here is likely that AD102 turned out to be better than expected which created a bigger gap between it and the chip below it in the stack forcing to clock this chip out of its best efficiency range.
Units wise though the differences between all top end and the next to it chips in Nvidia lineups are fairly the same, the only kinda recent exception here was with GK104 vs GK110.
I also kinda wonder if N5 pricing makes it impossible for Nvidia to use AD102 for a 4080 product like they did with GA102 on 8N.
 
I'm not sure how you're getting these %s but if it's x02+50% then it's pretty much the same as far back as Fermi with the only exception being Kepler.
Lovelace isn't that much different with 4080 being 10240 and 4090 being 16384.
At least when looking at actual products - which is what you should do since there's little point in looking at chips maximum configs if they aren't available as products.

50% is a much larger delta than usual when looking at enabled hardware or when looking at full chips. AD102 starts out with a massive 70% hardware advantage over AD103 so it’s bizarre that Nvidia is struggling with the second tier SKU. How were they expecting to close that gap?
 
Well, yeah, the point here is likely that AD102 turned out to be better than expected which created a bigger gap between it and the chip below it in the stack forcing to clock this chip out of its best efficiency range.

Another potential reason would be competition forced Nvidia to clock AD102 higher than desired and AD103 then had to stretch even further. The tiny TDP gap still doesn’t make sense though.

Units wise though the differences between all top end and the next to it chips in Nvidia lineups are fairly the same, the only kinda recent exception here was with GK104 vs GK110.

I don’t think that’s true. The gap has been 20-40%. Not 70%.

I also kinda wonder if N5 pricing makes it impossible for Nvidia to use AD102 for a 4080 product like they did with GA102 on 8N.

That’s a good point especially if they’re trying to do $700-$800 for the 4080.
 
Just going to list the numbers here since there might be some confusion/misunderstanding.

Number of SMs of each full chip and how much more SMs the larger chip (left) has over the smaller (right).

AD102 - 144. AD103 - 84. =71% more.

GA102 - 84. GA104 - 48. =75%.

GA102 - 84. GA103 - 60. =40%.

TU102 - 72. TU104 - 48. =50%.

GP102 - 30. GP104 - 20. =50%.

GM202 - 24. GM204 - 16. =50%.

GK110 - 15. GK104 - 8. =87.5%.

GF100/110 - 512. GF104 - 384. 50%. I used FP32 Shaders here, as the FP32 per SM is not uniform for Fermi.

Well, yeah, the point here is likely that AD102 turned out to be better than expected which created a bigger gap between it and the chip below it in the stack forcing to clock this chip out of its best efficiency range.
Units wise though the differences between all top end and the next to it chips in Nvidia lineups are fairly the same, the only kinda recent exception here was with GK104 vs GK110.
I also kinda wonder if N5 pricing makes it impossible for Nvidia to use AD102 for a 4080 product like they did with GA102 on 8N.

There has been different messaging regarding this from what I understand. It's partially that it was viably but not ideal for Nvidia that they cutdown GA102 as much as they did for RTX 3080. As well as that 8N yields were such that it made sense to offer a GA102 config cut down that much to to preserve overall yields.

With ADA and TSMC it could similarly be both. Cutting down AD102 any further is effectively just throwing away essentially too many transistors that are otherwise viable, as actual yields are relatively good. Combined with that the cost profile means you need to sell it above a certain point to sense.

A 116 SM AD102 would mathematically at least be the same cut ratio as that of RTX 3080 with GA102. But if the yield rate for 116 SM vs. the 128 SM rumor is actually that much better, you actually aren't gaining much on the cost side to use that as a significantly cheaper config. It could be that 128 SM out of 144 is already in that fairly optimal yield zone.

The other factor here is they were able to save costs by offering RTX 3080 with only 10GB VRAM to fit into that $700 price point. While now in 2022 with where every other product sits even 12GB would look out of place. So they have to at least double up memory this time. Not to mention the issue in that if they are going GDDR6 18Gbps, are there even 1GB chips at that speed?

There would also be a market segmentation issue beyond just actual costs. RTX 3080 10GB $700 and RTX 3090 24GB $1500 at least had VRAM mostly as a big difference to segment them. But a 128 SM RTX 4080 20GB at $700 and say even 144 SM RTX 4090 at $1500 would look even more frivolous for the latter, with almost no practical VRAM difference and an even smaller perf diff. Which means you'd be essentially losing out (on the business side) from both ends (less margin and/or sales for the 4090).
 
Last edited:
Just going to list the numbers here since there might be some confusion/misunderstanding.

Number of SMs of each full chip and how much more SMs the larger chip (left) has over the smaller (right).

AD102 - 144. AD103 - 84. =71% more.

GA102 - 84. GA104 - 48. =75%.

GA102 - 84. GA103 - 60. =40%.

TU102 - 72. TU104 - 48. =50%.

GP102 - 30. GP104 - 20. =50%.

GM202 - 24. GM204 - 16. =50%.

GK110 - 15. GK104 - 8. =87.5%.

GF100/110 - 512. GF104 - 384. 50%. I used FP32 Shaders here, as the FP32 per SM is not uniform for Fermi.

Thanks. So the gap is larger than usual but not unprecedented.

Volumes and pricing are key factors. The 3080 (80% enabled) is a much higher volume part than the 3090 (98% enabled). It’s unlikely that Nvidia will ask less than $1500 for the 4090 and it will likely remain a low volume SKU. If AD102 defect yields are better than expected why is the 4090 only 90% enabled? If AD102 clocks higher than expected why push power consumption so high for no reason?

It all points to some external factor messing with Nvidia’s plans. The theory that a company is struggling with product tiering because yields are “too good” sounds like fantasy.
 
Another interesting question is that of die sizes. With AD103 being roughly equal to GA102 in units (less MCs but bigger L2 cache) and the drop from a "10nm class" to "5nm class" process should we expect 4080 to run on a less than 300mm^2 die?
 
Another interesting question is that of die sizes. With AD103 being roughly equal to GA102 in units (less MCs but bigger L2 cache) and the drop from a "10nm class" to "5nm class" process should we expect 4080 to run on a less than 300mm^2 die?
I'm guessing AD102 is a big chip, larger than 600mm^2. That would make AD103 larger than 375mm^2 assuming AD102 is 1.6X larger than AD103.

The 1.6X size difference is based on GA102 die being 1.6X larger than GA104 (628mm^2 vs 392mm^2). The relative SM & MC count differences are very similar between the two pairs. 1.71X & 1.5X more on AD102 compared to AD103, 1.75X & 1.5X more on GA102 compared to GA104.

Using the same multiplier, AD103 being 300mm^2 would make AD102 only 480mm^2. Sounds rather small after Turing and Ampere.
 
I'm guessing AD102 is a big chip, larger than 600mm^2. That would make AD103 larger than 375mm^2 assuming AD102 is 1.6X larger than AD103.

The 1.6X size difference is based on GA102 die being 1.6X larger than GA104 (628mm^2 vs 392mm^2). The relative SM & MC count differences are very similar between the two pairs. 1.71X & 1.5X more on AD102 compared to AD103, 1.75X & 1.5X more on GA102 compared to GA104.

Using the same multiplier, AD103 being 300mm^2 would make AD102 only 480mm^2. Sounds rather small after Turing and Ampere.
Yes that's correct. AD102 is around 600mm2 and AD103 around 400mm2

Just going to list the numbers here since there might be some confusion/misunderstanding.

Number of SMs of each full chip and how much more SMs the larger chip (left) has over the smaller (right).

AD102 - 144. AD103 - 84. =71% more.

GA102 - 84. GA104 - 48. =75%.

GA102 - 84. GA103 - 60. =40%.

TU102 - 72. TU104 - 48. =50%.

GP102 - 30. GP104 - 20. =50%.

GM202 - 24. GM204 - 16. =50%.

GK110 - 15. GK104 - 8. =87.5%.

GF100/110 - 512. GF104 - 384. 50%. I used FP32 Shaders here, as the FP32 per SM is not uniform for Fermi.



There has been different messaging regarding this from what I understand. It's partially that it was viably but not ideal for Nvidia that they cutdown GA102 as much as they did for RTX 3080. As well as that 8N yields were such that it made sense to offer a GA102 config cut down that much to to preserve overall yields.

With ADA and TSMC it could similarly be both. Cutting down AD102 any further is effectively just throwing away essentially too many transistors that are otherwise viable, as actual yields are relatively good. Combined with that the cost profile means you need to sell it above a certain point to sense.

A 116 SM AD102 would mathematically at least be the same cut ratio as that of RTX 3080 with GA102. But if the yield rate for 116 SM vs. the 128 SM rumor is actually that much better, you actually aren't gaining much on the cost side to use that as a significantly cheaper config. It could be that 128 SM out of 144 is already in that fairly optimal yield zone.

The other factor here is they were able to save costs by offering RTX 3080 with only 10GB VRAM to fit into that $700 price point. While now in 2022 with where every other product sits even 12GB would look out of place. So they have to at least double up memory this time. Not to mention the issue in that if they are going GDDR6 18Gbps, are there even 1GB chips at that speed?

There would also be a market segmentation issue beyond just actual costs. RTX 3080 10GB $700 and RTX 3090 24GB $1500 at least had VRAM mostly as a big difference to segment them. But a 128 SM RTX 4080 20GB at $700 and say even 144 SM RTX 4090 at $1500 would look even more frivolous for the latter, with almost no practical VRAM difference and an even smaller perf diff. Which means you'd be essentially losing out (on the business side) from both ends (less margin and/or sales for the 4090).
this was my meaning. on point (y)
 
With the current rumored specs, nvidia are doing a proper scaling this time around which is reminiscent of the 980Ti/1080Ti vs. 980/1080 comparison.

With Turing and Ampere the flagchips just had one more GPC and the shader increases didn't line up with the overall performance increase across different games.

I think it's better to have distinct chips for the naming, many people used to think that the Ti card was just a normal x80 card with more bells and whistles. Not realizing that it often had bigger performance increase over x80, than x70 vs x80 and often was a different chip altogether.
 
Would be interesting to see AIB revenue numbers for the past year. They probably made a lot of profit per GPU but how many were they selling?
 
Status
Not open for further replies.
Back
Top