Speculation and Rumors: Nvidia Blackwell ...

Earlier in the same video lower settings use much lower VRAM though, just rewind

Also game generally don't become "unstable" (well, they shouldn't) when they run out of VRAM, they become slow.
They become unplayable due to VRAM trashing unless they are using the VRAM in a smart way where they just loose some performance. The latter are rare though so generally when you're actually running into VRAM limits you'll see it by results like 0.2 fps.

Here's a guy playing Avatar in 4K on a 4080 on Unobtaineoum:


The game a) doesn't seem to consume more than 15GBs of VRAM and b) runs at <20 fps anyway. No VRAM related issues there.
 
Last edited:
What's this has to do with Blackwell family? These appeals to how things were some time ago aren't doing us any favors.

What’s changed since the 4070 Ti launched? If big Blackwell is significantly faster there’s no technical reason cheaper SKUs can’t also be significantly faster. The product positioning is mostly a marketing & margins decision at that point.
 
welcome GAA
15% more speed with like 10-15% bump in chip level density at 20-30% wafer price bump?
ughhhhhh. eh. definitely not value!
Passable for DC parts, stings everywhere else.
If big Blackwell is significantly faster there’s no technical reason cheaper SKUs can’t also be significantly faster
GB202 will be 30-40% more SMs no matter how they chop it.
Smaller ones can not afford such luxuries.
GPUs intrinsically rely on cost per mm^2 yielded being the same. It's going up.
 
What’s changed since the 4070 Ti launched?
With 4070Ti Nvidia moved from Samsung's 10nm class node to TSMC's 5nm class one. The same won't repeat with Blackwell.

If big Blackwell is significantly faster there’s no technical reason cheaper SKUs can’t also be significantly faster.
There is an economical reason though. "Big Blackwell" will sell at a price which allows to cut into margins to provide said performance advancement. Smaller chips don't have that luxury.
These top end improvement are likely to run out soon too by the way. The margins there aren't infinite.
 
With 4070Ti Nvidia moved from Samsung's 10nm class node to TSMC's 5nm class one. The same won't repeat with Blackwell.

True but how do you reconcile that with the rumor that big Blackwell is much faster than big Ampere? Expecting GB202 @ 800mm^2?
 
True but how do you reconcile that with the rumor that big Blackwell is much faster than big Ampere? Expecting GB202 @ 800mm^2?
They're well overdue for an architecture overhaul. Hopefully that will yield some perf/mm^2 and perf/W benefits independent of node. Perf/mm^2 regressed with Turing (even on the TU20x chips so it's not all attributable to RT/tensor cores) and while it's hard to compare Ampere and Ada thanks to node differences it seems like there's still room to improve.
 
True but how do you reconcile that with the rumor that big Blackwell is much faster than big Ampere? Expecting GB202 @ 800mm^2?
It'll just be even more expensive than 4090. There is no limit to the "top end" segment, we've already seen cards at $3000 MSRP there before.
That being said and as I've said above the margins there are supposedly still allow for some improvements at the same price - in contrast to what we see below ~$600.

They're well overdue for an architecture overhaul. Hopefully that will yield some perf/mm^2 and perf/W benefits independent of node.
Remains to be seen how much of a change to Lovelace gaming Blackwell will end up being. It's not like current Nvidia GPU architecture is struggling in anything.
Also the last time Nvidia switched architectures we've got Turning which was anything but an improvement in either. This time we are getting a process improvement at least (Turing didn't) so we'll see.
 
with such a big gap between GB202 and GB203 i think it's possible that only GB202 will utilize N3 and the rest will go for N4P.
 
with such a big gap between GB202 and GB203 i think it's possible that only GB202 will utilize N3 and the rest will go for N4P.

That would be interesting. The smaller AD10x dies certainly have room to grow. AD106 is ~190mm^2 which is kinda crazy. A 250mm^2 GB206 on N4 could do some damage if power consumption is manageable.
 
It'll just be even more expensive than 4090. There is no limit to the "top end" segment, we've already seen cards at $3000 MSRP there before.
That being said and as I've said above the margins there are supposedly still allow for some improvements at the same price - in contrast to what we see below ~$600.


Remains to be seen how much of a change to Lovelace gaming Blackwell will end up being. It's not like current Nvidia GPU architecture is struggling in anything.
Also the last time Nvidia switched architectures we've got Turning which was anything but an improvement in either. This time we are getting a process improvement at least (Turing didn't) so we'll see.
I think GB203 will be N3 but maybe GB205, 206 & 207 be N4/SF4X?

Overall IDK what the SKUs will be because with a 512-bit bus GB202 I do wonder if they can harvest enough 320 bit and 384 bit dies to make 5080s & 5080 Tis with 20 & 24 GB of VRAM? Because I can't see the market acceting another 80 class 16GB card, even if raster of the full die matches a 4090 or even that canned 4090 Ti and RT/PT faster than that.
It'll just be even more expensive than 4090. There is no limit to the "top end" segment, we've already seen cards at $3000 MSRP there before.
That being said and as I've said above the margins there are supposedly still allow for some improvements at the same price - in contrast to what we see below ~$600.


Remains to be seen how much of a change to Lovelace gaming Blackwell will end up being. It's not like current Nvidia GPU architecture is struggling in anything.
Also the last time Nvidia switched architectures we've got Turning which was anything but an improvement in either. This time we are getting a process improvement at least (Turing didn't) so we'll see.
Maxwell was a big improvement over Kepler on the same node. Blackwell looks like a die shrink + big architectural change for some dies.
 
I think GB203 will be N3 but maybe GB205, 206 & 207 be N4/SF4X?
Highly unlikely.

Overall IDK what the SKUs will be because with a 512-bit bus GB202 I do wonder if they can harvest enough 320 bit and 384 bit dies to make 5080s & 5080 Tis with 20 & 24 GB of VRAM?
Sure if you're ready to buy the 320 bit one at $1800 or something.

Because I can't see the market acceting another 80 class 16GB card, even if raster of the full die matches a 4090 or even that canned 4090 Ti and RT/PT faster than that.
There is no "80 class", perf/$ is the only metric which matters, and putting >16GBs on a card with ~4090 performance is just making this metric worse for no real reason.
I also do wonder if the 3GB G7 chips will cost as much as 4GB ones making them useless at least at the start. If not then it's possible that you'll see them used on the 256 bit product this time.

Maxwell was a big improvement over Kepler on the same node.
Sure but that was pretty much the only time it happened.

Blackwell looks like a die shrink + big architectural change for some dies.
All Blackwell dies will use the same architecture and unless we know more I don't know how you could call it a "die shrink".
 
15% more speed with like 10-15% bump in chip level density at 20-30% wafer price bump?
ughhhhhh. eh. definitely not value!
Passable for DC parts, stings everywhere else.

GB202 will be 30-40% more SMs no matter how they chop it.
Smaller ones can not afford such luxuries.
GPUs intrinsically rely on cost per mm^2 yielded being the same. It's going up.

It's so expensive :no:

Carbon nanotubes and other stuff aren't till next decade, from silicon to CNfet is a big leap. Then to graphene will be a huge leap, put a 4090 in a pair of smartglasses. But until then we get chiplets and packaging; I dunno maybe curvilinear masks will be cool?

So I guess for Blackwell Chiplets are in as a very early version for AI, but not for graphics. Wondering how that 512bit bus version will cost then, eugh.
 
Last edited:
Just to state the obvious: a 64-byte memory bus is going to need a LOT of traces for communication. The bigger die is at least partially an artifact of needing physical surface area to mate all those traces.

Still, I do agree it's sounding like a big MFer.
 
Just to state the obvious: a 64-byte memory bus is going to need a LOT of traces for communication. The bigger die is at least partially an artifact of needing physical surface area to mate all those traces.

Still, I do agree it's sounding like a big MFer.

There's some benefits to going with an older process for this exact reason as well - you get the floorplan room to add some more GDDRx controllers, as well as likely some cost savings on buying the actual memory chips, which may be enough to offset the increase in PCB costs. The additional PCB area probably wouldn't cost too much, although I'm not entirely sure where the breakpoint is for GDDR trace routing to require extra PCB layers, which definitely would cost some.

While the data for GDDR6 at DRAMExchange is locked behind a paywall, generally higher density DRAMs cost more per bit as there's more demand for them.

1710457307664.png

Going from 128-bit to 256-bit memory bus on a theoretical midrange part made on an older process would also likely let you save on SRAM for the last level cache.
That also probably won't do you any favors power-wise, although with all that 'real' memory bandwidth to burn, you could probably run all the GDDRx chips way down in their 'happy' place clocks/voltage wise, or even just stick with GDDR6 for some of the mid-lower end Blackwell parts.
 
It makes a lot of sense if they're still going for retsized dies with a large amount of SRAM, N3E yields and SRAM density are probably not ready yet, and if they're back to yearly refreshes as rumoured then no one is likely to beat them to N3E mass production by any noticeable margin either. Still, it's a bit disappointing, would have pretty fun to see them go into N3E full mass production at the same time as Apple!

Looking at my power numbers for H100 vs AD102 and the voltage curves I'm seeing, I think 700W for 1600mm2 of 4nm is feasible, but they'll definitely have to be aggressive in terms of power efficiency focus at every level and still be power/thermals constrained in a bunch of cases (H100 already is!) - so it'll be interesting to see AI/HPC workload optimisation shift further from "optimise performance" to "optimise power" perhaps!
 
H100 SXM is already at 700mm^2. Seems impossible to jam over 1400mm^2 of the same silicon into the same power envelope. What are the odds that the first Blackwell release is just a single B100 die?
 
Back
Top