Speculation and Rumors: Nvidia Blackwell ...

Broopster · Nov 30, 2024

I’d like to throw a question out there regarding GB202 and GB203 L2 cache. Despite our proximity to launch, I haven’t seen any leaks about the cache size. The 128 MB rumor has been out for well over a year and never confirmed by anyone to my knowledge. It could easily have been an estimation based on the assumption that the core/cache ratio would stay the same as Ada. Sounds like that could be tough with only 22% more die space (on GB202) and added MCs, though AMD managed to cut down the L3 cache size on 4nm for Zen 5, so it’s not impossible I guess. GB203 shouldn’t have the same problem so I’d guess that cache would stay the same.

Curious if anyone has seen anything or has any thoughts on this.

homerdog · Nov 30, 2024

Broopster said:
I think there is also a generational effect as well. I have teenage brothers from a second marriage and they exclusively game on laptop or console - they marveled at my desktop case like something at the Smithsonian. They’ve grown up being able to do pretty much anything on the go, so bifurcating their laptop and desktop like I do seems strange to them (there isn’t a desktop in their house). I also think one can’t underestimate the popularity of low-Fi games to that generation. One of my brothers exclusively plays Roblox and Minecraft on his laptop. Admittedly, you probably don’t need a dGPU for that but the parents bought him a laptop with one because they asked to buy a gaming laptop!

VRR monitors are de rigeur for a modern gaming laptop, whereas when I bought a top-of-the-line gaming laptop (admittedly Maxwell, not Turing) I don’t recall even having the option, meaning it was 60 hz Vsync or bust. Now if you can only hit 50 fps no problem, and then DLSS 3 gets you well over 60. I swore I’d never buy one again - it was essentially a portable desktop. But I’m confident if I did buy one today my experience would be far better (to be clear, I have zero interest in one). Please bear in mind this post is entirely anecdotal, and only intended as a partial explanation.

OK that makes sense. But with the popularity of streaming I'd think young people wouldn't find the concept of a desktop so foreign anymore.

Broopster said:
I’d like to throw a question out there regarding GB202 and GB203 L2 cache. Despite our proximity to launch, I haven’t seen any leaks about the cache size. The 128 MB rumor has been out for well over a year and never confirmed by anyone to my knowledge. It could easily have been an estimation based on the assumption that the core/cache ratio would stay the same as Ada. Sounds like that could be tough with only 22% more die space (on GB202) and added MCs, though AMD managed to cut down the L3 cache size on 4nm for Zen 5, so it’s not impossible I guess. GB203 shouldn’t have the same problem so I’d guess that cache would stay the same.

Curious if anyone has seen anything or has any thoughts on this.

Been wondering this myself. If yields are better they could reduce the amount of cache on the chip without reducing the amount on the end product. The 4090 left nearly 30% of its L2 dark.

Broopster · Nov 30, 2024

homerdog said:
But with the popularity of streaming I'd think young people wouldn't find the concept of a desktop so foreign anymore.

Again, it’s anecdotal. My father is older than most parents of teenagers, and so he grew up in the Atari era and always cared more about sports anyway. So he has zero interest in anything PC related - he barely even uses his laptop. I’m sure there are plenty of youngsters with pc parents who will pick up the hobby. I’m unlikely to ever have kids (partner doesn’t want them) and I tried to get one of them interested in PC, but when all he wants to play is Roblox and Minecraft it’s hard to get excited for building a computer.

Unlike 10 years ago, PC actually gets ports of virtually all games now. I really think that APUs could push PC/laptop gaming further into the mainstream if they can push performance enough. Since there’s little cause to keep pushing 16inch resolution we should get there eventually. But that’s a topic for another thread!

homerdog said:
The 4090 left nearly 30% of its L2 dark.

I strongly suspect that was a segmentation decision. All the memory controllers were enabled, and even if yields played some role I can’t see the need to disable 25% of the cache when you only needed to cut 11.1% of your cores to hit the needed yields. Only Nvidia knows how much it would’ve helped performance, but it certainly wouldn’t have hurt, especially in RT.

trinibwoy · Nov 30, 2024

Broopster said:
I’d like to throw a question out there regarding GB202 and GB203 L2 cache. Despite our proximity to launch, I haven’t seen any leaks about the cache size. The 128 MB rumor has been out for well over a year and never confirmed by anyone to my knowledge. It could easily have been an estimation based on the assumption that the core/cache ratio would stay the same as Ada. Sounds like that could be tough with only 22% more die space (on GB202) and added MCs, though AMD managed to cut down the L3 cache size on 4nm for Zen 5, so it’s not impossible I guess. GB203 shouldn’t have the same problem so I’d guess that cache would stay the same.

Curious if anyone has seen anything or has any thoughts on this.

Rumors point to a significant increase in bandwidth due to a wider bus and higher clocks. If anything Nvidia may decide to reduce L2 on GB202. No way to know until we see the goods

Broopster · Nov 30, 2024

trinibwoy said:
Rumors point to a significant increase in bandwidth due to a wider bus and higher clocks. If anything Nvidia may decide to reduce L2 on GB202. No way to know until we see the goods

Yeah, the 128MB rumor made more sense when people were talking about a 384-bit bus width but I just don’t see the need now. We’ll know soon enough, surprised nothing has leaked yet.

homerdog · Nov 30, 2024

Broopster said:
I strongly suspect that was a segmentation decision. All the memory controllers were enabled, and even if yields played some role I can’t see the need to disable 25% of the cache when you only needed to cut 11.1% of your cores to hit the needed yields. Only Nvidia knows how much it would’ve helped performance, but it certainly wouldn’t have hurt, especially in RT.

Technically we could find out. Somebody match core and memory clocks on a 4090 and a Quadro RTX6000 Ada and see how they differ. Someone smarter than me can surely figure out how to account for the SM count difference (~10%).

DavidGraham · Dec 1, 2024

homerdog said:
Technically we could find out. Somebody match core and memory clocks on a 4090 and a Quadro RTX6000 Ada and see how they differ. Someone smarter than me can surely figure out how to account for the SM count difference (~10%).

The Quadro RTX6000 Ada is kinda like the apex predator graphics GPU right now, with the full AD102 die enabled and 48 GB of VRAM, however it's marred down by a limited TDP of 300w (vs 450w for the 4090) and by using the regular GDDR6 (vs GDDR6X in the 4090). In actual game use, it's going to be power limited and will have ~8% less memory bandwidth than the 4090. In most cases it's swinging between being 10% faster than the 4090 or 20% slower than the 4090 depending on where the bottleneck lies.

homerdog · Dec 1, 2024

DavidGraham said:
The Quadro RTX6000 Ada is kinda like the apex predator graphics GPU right now, with the full AD102 die enabled and 48 GB of VRAM, however it's marred down by a limited TDP of 300w (vs 450w for the 4090) and by using the regular GDDR6 (vs GDDR6X in the 4090). In actual game use, it's going to be power limited and will have ~8% less memory bandwidth than the 4090. In most cases it's swinging between being 10% faster than the 4090 or 20% slower than the 4090 depending on where the bottleneck lies.

That's why I said match GPU and memory clocks. Clock them both at the level of the Quadro and run some tests.

Here I'll even link you the cards.

https://www.newegg.com/Product/ComboDealDetails?ItemList=Combo.4733042

^That one comes with a bonus 3060.

https://www.newegg.com/pny-vcnrtx6000ada-pb/p/N82E16814133886?Item=N82E16814133886

trinibwoy · Tuesday at 12:14 PM

It’s pretty late in the game to be getting new rumors but seems the 5080 is tapped for 30 Gbps VRAM. That would mean 960 GB/s bandwidth in 4090 territory.

https://videocardz.com/newz/nvidia-geforce-rtx-5080-tipped-to-feature-30-gbps-gddr7-memory

DegustatoR · Tuesday at 1:09 PM

Sounds about right for the rumored general performance.

trinibwoy · Tuesday at 3:13 PM

Based on these rumors bandwidth per SM is getting a decent bump with Blackwell. The 4090 is downright anemic which may explain why it doesn't pull away from the 4080 as much as specs would indicate. The 5090 will have 33% more bandwidth per SM than the 4090!

Bandwidth per SM:

10.5 GB/s - 5090
11.4 GB/s - 5080
12.8 GB/s - 5070 Ti
7.9 GB/s - 4090
9.2 GB/s - 4080
8.4 GB/s - 4070 Ti
11.3 GB/s - 4060

IQandHDR · Tuesday at 3:36 PM

trinibwoy said:
Based on these rumors bandwidth per SM is getting a decent bump with Blackwell. The 4090 is downright anemic which may explain why it doesn't pull away from the 4080 as much as specs would indicate. The 5090 will have 33% more bandwidth per SM than the 4090!

Bandwidth per SM:

10.5 GB/s - 5090
11.4 GB/s - 5080
12.8 GB/s - 5070 Ti
7.9 GB/s - 4090
9.2 GB/s - 4080
8.4 GB/s - 4070 Ti
11.3 GB/s - 4060

If you use RT/PT it is anywhere from 40-100% faster:

Cyberpunk 2077: Phantom Liberty Benchmark Performance Review - 25+ GPUs Tested

The Phantom Liberty expansion brings big improvements to Cyberpunk 2077 and adds an exciting new story line. The game also gets support for DLSS 3.5 and Path Tracing. In our performance review, we're taking a closer look at image quality, VRAM usage, and performance on a wide selection of modern...

www.techpowerup.com

Nvidia RTX 4090 vs. RTX 3090 vs. RTX 3090 Ti: Which graphics card is the best?

The Nvidia RTX 4090 is a beast. But how does it fare against last-generation monsters like the RTX 3090 and 3090 Ti? Let's find out.

www.digitaltrends.com

I would love a 5090 being "downright anemic" over the 4090 if that is the difference in games

trinibwoy · Tuesday at 3:50 PM

IQandHDR said:
If you use RT/PT it is anywhere from 40-100% faster

Yep, and not relevant to its performance in non-PT games. PT may be benefiting from something other than bandwidth like L2 size or the significantly higher number of RT cores.

pcchen · Tuesday at 3:52 PM

4090 is indeed a bit bandwidth starved, since it is so much more powerful than 3090 (it has more than double FP16/FP32 performance) but only slightly higher memory bandwidth, mostly because they use basically the same memory, just with a bit higher clock. This is unfortunately more about no better memory existed at the time, so NVIDIA really had no choice (other than making a really expensive 512 bits GPU, or going for HBM). However, memory bandwidth is not as crucial for workloads with more computation (e.g. ray tracing), that's why 4090 performs much better than 3090 in such cases (better RT cores certainly also help), but the imbalance of computation/bandwidth is still there.

Cyan · Tuesday at 4:15 PM

trinibwoy said:
Based on these rumors bandwidth per SM is getting a decent bump with Blackwell. The 4090 is downright anemic which may explain why it doesn't pull away from the 4080 as much as specs would indicate. The 5090 will have 33% more bandwidth per SM than the 4090!

Bandwidth per SM:

10.5 GB/s - 5090
11.4 GB/s - 5080
12.8 GB/s - 5070 Ti
7.9 GB/s - 4090
9.2 GB/s - 4080
8.4 GB/s - 4070 Ti
11.3 GB/s - 4060

what worries me the most is the amount of VRAM nVidia is going to put on those cards. According to TechPowerup's GPU Database, the RTX 5060 just comes with 8GB of VRAM.

TechPowerUp

Graphics card and GPU database with specifications for products launched in recent years. Includes clocks, photos, and technical details.

www.techpowerup.com

DegustatoR · Tuesday at 4:20 PM

Cyan said:
what worries me the most is the amount of VRAM nVidia is going to put on those cards. According to TechPowerup's GPU Database, the RTX 5060 just comes with 8GB of VRAM.

It will. Depending on its performance and price it could also be a proper decision.
5070 is 12, 5080 is 16, 5090 is 32.
Apparently 5060Ti will have 16 for those who want "moar VRAM".
And 5070Ti will get 16 also.

I do wonder when we might see x1.5 capacity chips though and if that can bring some changes to the Tis and/or a possible Super refresh in 2026.

trinibwoy · Tuesday at 4:37 PM

DegustatoR said:
Depending on its performance and price it could also be a proper decision.

It’ll sell in droves if performance is there despite all the handwringing over 8GB. There’ll also be lots of other options to choose from if 8GB is a a deal breaker.

DavidGraham · Tuesday at 4:43 PM

RTX 5000 will come with a new "neural rendering capabilities" feature, according to marketing points leaked by INNO3D. Also "advanced' DLSS to offer more image quality and faster fps.

https://videocardz.com/newz/inno3d-teases-neural-rendering-and-advanced-dlss-for-geforce-rtx-50-gpus-at-ces-2025

DegustatoR · Tuesday at 4:46 PM

trinibwoy said:
There’ll also be lots of other options to choose from if 8GB is a a deal breaker.

Well I wouldn't say "lots".
N44 will have 128 bit G6 so that's 8GB too most likely.
BMG-G21 is the only card in this class with 12GB but it's ~4060 performance level and will likely end up below either 5060 or 8600.

DavidGraham said:
RTX 5000 will come with a new "neural rendering capabilities" feature, according to marketing points leaked by INNO3D. Also "advanced' DLSS to offer more image quality and faster fps.

This looks like a rather meaningless marketing placeholders tbh. All videocards are capable of "neural rendering" and of course there will be "advanced DLSS" and "enhanced RT" on future products.

trinibwoy · Tuesday at 6:04 PM

DegustatoR said:
Well I wouldn't say "lots".
N44 will have 128 bit G6 so that's 8GB too most likely.
BMG-G21 is the only card in this class with 12GB but it's ~4060 performance level and will likely end up below either 5060 or 8600.

Yeah I meant lots of alternative options at a price if your heart is set on more than 8GB. The vast majority of games and gamers will be fine with 8GB entry level cards.

Speculation and Rumors: Nvidia Blackwell ...

Broopster

homerdog

donator of the year

Broopster

trinibwoy

Meh

Broopster

homerdog

donator of the year

DavidGraham

homerdog

donator of the year

trinibwoy

Meh

DegustatoR

trinibwoy

Meh

IQandHDR

Cyberpunk 2077: Phantom Liberty Benchmark Performance Review - 25+ GPUs Tested

Nvidia RTX 4090 vs. RTX 3090 vs. RTX 3090 Ti: Which graphics card is the best?

trinibwoy

Meh

pcchen

Moderator

Cyan

orange

TechPowerUp

DegustatoR

trinibwoy

Meh

DavidGraham

DegustatoR

trinibwoy

Meh