Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
I’d like to throw a question out there regarding GB202 and GB203 L2 cache. Despite our proximity to launch, I haven’t seen any leaks about the cache size. The 128 MB rumor has been out for well over a year and never confirmed by anyone to my knowledge. It could easily have been an estimation based on the assumption that the core/cache ratio would stay the same as Ada. Sounds like that could be tough with only 22% more die space (on GB202) and added MCs, though AMD managed to cut down the L3 cache size on 4nm for Zen 5, so it’s not impossible I guess. GB203 shouldn’t have the same problem so I’d guess that cache would stay the same.

Curious if anyone has seen anything or has any thoughts on this.
 
I think there is also a generational effect as well. I have teenage brothers from a second marriage and they exclusively game on laptop or console - they marveled at my desktop case like something at the Smithsonian. They’ve grown up being able to do pretty much anything on the go, so bifurcating their laptop and desktop like I do seems strange to them (there isn’t a desktop in their house). I also think one can’t underestimate the popularity of low-Fi games to that generation. One of my brothers exclusively plays Roblox and Minecraft on his laptop. Admittedly, you probably don’t need a dGPU for that but the parents bought him a laptop with one because they asked to buy a gaming laptop!

VRR monitors are de rigeur for a modern gaming laptop, whereas when I bought a top-of-the-line gaming laptop (admittedly Maxwell, not Turing) I don’t recall even having the option, meaning it was 60 hz Vsync or bust. Now if you can only hit 50 fps no problem, and then DLSS 3 gets you well over 60. I swore I’d never buy one again - it was essentially a portable desktop. But I’m confident if I did buy one today my experience would be far better (to be clear, I have zero interest in one). Please bear in mind this post is entirely anecdotal, and only intended as a partial explanation.
OK that makes sense. But with the popularity of streaming I'd think young people wouldn't find the concept of a desktop so foreign anymore.
I’d like to throw a question out there regarding GB202 and GB203 L2 cache. Despite our proximity to launch, I haven’t seen any leaks about the cache size. The 128 MB rumor has been out for well over a year and never confirmed by anyone to my knowledge. It could easily have been an estimation based on the assumption that the core/cache ratio would stay the same as Ada. Sounds like that could be tough with only 22% more die space (on GB202) and added MCs, though AMD managed to cut down the L3 cache size on 4nm for Zen 5, so it’s not impossible I guess. GB203 shouldn’t have the same problem so I’d guess that cache would stay the same.

Curious if anyone has seen anything or has any thoughts on this.
Been wondering this myself. If yields are better they could reduce the amount of cache on the chip without reducing the amount on the end product. The 4090 left nearly 30% of its L2 dark. :unsure:
 
But with the popularity of streaming I'd think young people wouldn't find the concept of a desktop so foreign anymore.
Again, it’s anecdotal. My father is older than most parents of teenagers, and so he grew up in the Atari era and always cared more about sports anyway. So he has zero interest in anything PC related - he barely even uses his laptop. I’m sure there are plenty of youngsters with pc parents who will pick up the hobby. I’m unlikely to ever have kids (partner doesn’t want them) and I tried to get one of them interested in PC, but when all he wants to play is Roblox and Minecraft it’s hard to get excited for building a computer.

Unlike 10 years ago, PC actually gets ports of virtually all games now. I really think that APUs could push PC/laptop gaming further into the mainstream if they can push performance enough. Since there’s little cause to keep pushing 16inch resolution we should get there eventually. But that’s a topic for another thread!
The 4090 left nearly 30% of its L2 dark.
I strongly suspect that was a segmentation decision. All the memory controllers were enabled, and even if yields played some role I can’t see the need to disable 25% of the cache when you only needed to cut 11.1% of your cores to hit the needed yields. Only Nvidia knows how much it would’ve helped performance, but it certainly wouldn’t have hurt, especially in RT.
 
I’d like to throw a question out there regarding GB202 and GB203 L2 cache. Despite our proximity to launch, I haven’t seen any leaks about the cache size. The 128 MB rumor has been out for well over a year and never confirmed by anyone to my knowledge. It could easily have been an estimation based on the assumption that the core/cache ratio would stay the same as Ada. Sounds like that could be tough with only 22% more die space (on GB202) and added MCs, though AMD managed to cut down the L3 cache size on 4nm for Zen 5, so it’s not impossible I guess. GB203 shouldn’t have the same problem so I’d guess that cache would stay the same.

Curious if anyone has seen anything or has any thoughts on this.

Rumors point to a significant increase in bandwidth due to a wider bus and higher clocks. If anything Nvidia may decide to reduce L2 on GB202. No way to know until we see the goods
 
Rumors point to a significant increase in bandwidth due to a wider bus and higher clocks. If anything Nvidia may decide to reduce L2 on GB202. No way to know until we see the goods
Yeah, the 128MB rumor made more sense when people were talking about a 384-bit bus width but I just don’t see the need now. We’ll know soon enough, surprised nothing has leaked yet.
 
I strongly suspect that was a segmentation decision. All the memory controllers were enabled, and even if yields played some role I can’t see the need to disable 25% of the cache when you only needed to cut 11.1% of your cores to hit the needed yields. Only Nvidia knows how much it would’ve helped performance, but it certainly wouldn’t have hurt, especially in RT.
Technically we could find out. Somebody match core and memory clocks on a 4090 and a Quadro RTX6000 Ada and see how they differ. Someone smarter than me can surely figure out how to account for the SM count difference (~10%).
 
Technically we could find out. Somebody match core and memory clocks on a 4090 and a Quadro RTX6000 Ada and see how they differ. Someone smarter than me can surely figure out how to account for the SM count difference (~10%).
The Quadro RTX6000 Ada is kinda like the apex predator graphics GPU right now, with the full AD102 die enabled and 48 GB of VRAM, however it's marred down by a limited TDP of 300w (vs 450w for the 4090) and by using the regular GDDR6 (vs GDDR6X in the 4090). In actual game use, it's going to be power limited and will have ~8% less memory bandwidth than the 4090. In most cases it's swinging between being 10% faster than the 4090 or 20% slower than the 4090 depending on where the bottleneck lies.
 
The Quadro RTX6000 Ada is kinda like the apex predator graphics GPU right now, with the full AD102 die enabled and 48 GB of VRAM, however it's marred down by a limited TDP of 300w (vs 450w for the 4090) and by using the regular GDDR6 (vs GDDR6X in the 4090). In actual game use, it's going to be power limited and will have ~8% less memory bandwidth than the 4090. In most cases it's swinging between being 10% faster than the 4090 or 20% slower than the 4090 depending on where the bottleneck lies.
That's why I said match GPU and memory clocks. Clock them both at the level of the Quadro and run some tests.

Here I'll even link you the cards.

^That one comes with a bonus 3060.


:mrgreen:
 
Last edited:
Based on these rumors bandwidth per SM is getting a decent bump with Blackwell. The 4090 is downright anemic which may explain why it doesn't pull away from the 4080 as much as specs would indicate. The 5090 will have 33% more bandwidth per SM than the 4090!

Bandwidth per SM:

10.5 GB/s - 5090
11.4 GB/s - 5080
12.8 GB/s - 5070 Ti
7.9 GB/s - 4090
9.2 GB/s - 4080
8.4 GB/s - 4070 Ti
11.3 GB/s - 4060
 
Based on these rumors bandwidth per SM is getting a decent bump with Blackwell. The 4090 is downright anemic which may explain why it doesn't pull away from the 4080 as much as specs would indicate. The 5090 will have 33% more bandwidth per SM than the 4090!

Bandwidth per SM:

10.5 GB/s - 5090
11.4 GB/s - 5080
12.8 GB/s - 5070 Ti
7.9 GB/s - 4090
9.2 GB/s - 4080
8.4 GB/s - 4070 Ti
11.3 GB/s - 4060
If you use RT/PT it is anywhere from 40-100% faster:


I would love a 5090 being "downright anemic" over the 4090 if that is the difference in games 🤷‍♂️
 
4090 is indeed a bit bandwidth starved, since it is so much more powerful than 3090 (it has more than double FP16/FP32 performance) but only slightly higher memory bandwidth, mostly because they use basically the same memory, just with a bit higher clock. This is unfortunately more about no better memory existed at the time, so NVIDIA really had no choice (other than making a really expensive 512 bits GPU, or going for HBM). However, memory bandwidth is not as crucial for workloads with more computation (e.g. ray tracing), that's why 4090 performs much better than 3090 in such cases (better RT cores certainly also help), but the imbalance of computation/bandwidth is still there.
 
Based on these rumors bandwidth per SM is getting a decent bump with Blackwell. The 4090 is downright anemic which may explain why it doesn't pull away from the 4080 as much as specs would indicate. The 5090 will have 33% more bandwidth per SM than the 4090!

Bandwidth per SM:

10.5 GB/s - 5090
11.4 GB/s - 5080
12.8 GB/s - 5070 Ti
7.9 GB/s - 4090
9.2 GB/s - 4080
8.4 GB/s - 4070 Ti
11.3 GB/s - 4060
what worries me the most is the amount of VRAM nVidia is going to put on those cards. According to TechPowerup's GPU Database, the RTX 5060 just comes with 8GB of VRAM.

 
what worries me the most is the amount of VRAM nVidia is going to put on those cards. According to TechPowerup's GPU Database, the RTX 5060 just comes with 8GB of VRAM.
It will. Depending on its performance and price it could also be a proper decision.
5070 is 12, 5080 is 16, 5090 is 32.
Apparently 5060Ti will have 16 for those who want "moar VRAM".
And 5070Ti will get 16 also.

I do wonder when we might see x1.5 capacity chips though and if that can bring some changes to the Tis and/or a possible Super refresh in 2026.
 
There’ll also be lots of other options to choose from if 8GB is a a deal breaker.
Well I wouldn't say "lots".
N44 will have 128 bit G6 so that's 8GB too most likely.
BMG-G21 is the only card in this class with 12GB but it's ~4060 performance level and will likely end up below either 5060 or 8600.

RTX 5000 will come with a new "neural rendering capabilities" feature, according to marketing points leaked by INNO3D. Also "advanced' DLSS to offer more image quality and faster fps.
This looks like a rather meaningless marketing placeholders tbh. All videocards are capable of "neural rendering" and of course there will be "advanced DLSS" and "enhanced RT" on future products.
 
Well I wouldn't say "lots".
N44 will have 128 bit G6 so that's 8GB too most likely.
BMG-G21 is the only card in this class with 12GB but it's ~4060 performance level and will likely end up below either 5060 or 8600.

Yeah I meant lots of alternative options at a price if your heart is set on more than 8GB. The vast majority of games and gamers will be fine with 8GB entry level cards.
 
Back
Top