Speculation and Rumors: Nvidia Blackwell ...

DegustatoR · Mar 13, 2024

Frenetic Pony said:
Earlier in the same video lower settings use much lower VRAM though, just rewind

Also game generally don't become "unstable" (well, they shouldn't) when they run out of VRAM, they become slow.

They become unplayable due to VRAM trashing unless they are using the VRAM in a smart way where they just loose some performance. The latter are rare though so generally when you're actually running into VRAM limits you'll see it by results like 0.2 fps.

Here's a guy playing Avatar in 4K on a 4080 on Unobtaineoum:

The game a) doesn't seem to consume more than 15GBs of VRAM and b) runs at <20 fps anyway. No VRAM related issues there.

trinibwoy · Mar 13, 2024

DegustatoR said:
What's this has to do with Blackwell family? These appeals to how things were some time ago aren't doing us any favors.

What’s changed since the 4070 Ti launched? If big Blackwell is significantly faster there’s no technical reason cheaper SKUs can’t also be significantly faster. The product positioning is mostly a marketing & margins decision at that point.

Bondrewd · Mar 13, 2024

Granath said:
welcome GAA

15% more speed with like 10-15% bump in chip level density at 20-30% wafer price bump?
ughhhhhh. eh. definitely not value!
Passable for DC parts, stings everywhere else.

trinibwoy said:
If big Blackwell is significantly faster there’s no technical reason cheaper SKUs can’t also be significantly faster

GB202 will be 30-40% more SMs no matter how they chop it.
Smaller ones can not afford such luxuries.
GPUs intrinsically rely on cost per mm^2 yielded being the same. It's going up.

DegustatoR · Mar 13, 2024

trinibwoy said:
What’s changed since the 4070 Ti launched?

With 4070Ti Nvidia moved from Samsung's 10nm class node to TSMC's 5nm class one. The same won't repeat with Blackwell.

trinibwoy said:
If big Blackwell is significantly faster there’s no technical reason cheaper SKUs can’t also be significantly faster.

There is an economical reason though. "Big Blackwell" will sell at a price which allows to cut into margins to provide said performance advancement. Smaller chips don't have that luxury.
These top end improvement are likely to run out soon too by the way. The margins there aren't infinite.

trinibwoy · Mar 14, 2024

DegustatoR said:
With 4070Ti Nvidia moved from Samsung's 10nm class node to TSMC's 5nm class one. The same won't repeat with Blackwell.

True but how do you reconcile that with the rumor that big Blackwell is much faster than big Ampere? Expecting GB202 @ 800mm^2?

Qesa · Mar 14, 2024

trinibwoy said:
True but how do you reconcile that with the rumor that big Blackwell is much faster than big Ampere? Expecting GB202 @ 800mm^2?

They're well overdue for an architecture overhaul. Hopefully that will yield some perf/mm^2 and perf/W benefits independent of node. Perf/mm^2 regressed with Turing (even on the TU20x chips so it's not all attributable to RT/tensor cores) and while it's hard to compare Ampere and Ada thanks to node differences it seems like there's still room to improve.

DegustatoR · Mar 14, 2024

trinibwoy said:
True but how do you reconcile that with the rumor that big Blackwell is much faster than big Ampere? Expecting GB202 @ 800mm^2?

It'll just be even more expensive than 4090. There is no limit to the "top end" segment, we've already seen cards at $3000 MSRP there before.
That being said and as I've said above the margins there are supposedly still allow for some improvements at the same price - in contrast to what we see below ~$600.

Qesa said:
They're well overdue for an architecture overhaul. Hopefully that will yield some perf/mm^2 and perf/W benefits independent of node.

Remains to be seen how much of a change to Lovelace gaming Blackwell will end up being. It's not like current Nvidia GPU architecture is struggling in anything.
Also the last time Nvidia switched architectures we've got Turning which was anything but an improvement in either. This time we are getting a process improvement at least (Turing didn't) so we'll see.

vola · Mar 14, 2024

with such a big gap between GB202 and GB203 i think it's possible that only GB202 will utilize N3 and the rest will go for N4P.

trinibwoy · Mar 14, 2024

vola said:
with such a big gap between GB202 and GB203 i think it's possible that only GB202 will utilize N3 and the rest will go for N4P.

That would be interesting. The smaller AD10x dies certainly have room to grow. AD106 is ~190mm^2 which is kinda crazy. A 250mm^2 GB206 on N4 could do some damage if power consumption is manageable.

Dangerman · Mar 14, 2024

DegustatoR said:
It'll just be even more expensive than 4090. There is no limit to the "top end" segment, we've already seen cards at $3000 MSRP there before.
That being said and as I've said above the margins there are supposedly still allow for some improvements at the same price - in contrast to what we see below ~$600.

Remains to be seen how much of a change to Lovelace gaming Blackwell will end up being. It's not like current Nvidia GPU architecture is struggling in anything.
Also the last time Nvidia switched architectures we've got Turning which was anything but an improvement in either. This time we are getting a process improvement at least (Turing didn't) so we'll see.

I think GB203 will be N3 but maybe GB205, 206 & 207 be N4/SF4X?

Overall IDK what the SKUs will be because with a 512-bit bus GB202 I do wonder if they can harvest enough 320 bit and 384 bit dies to make 5080s & 5080 Tis with 20 & 24 GB of VRAM? Because I can't see the market acceting another 80 class 16GB card, even if raster of the full die matches a 4090 or even that canned 4090 Ti and RT/PT faster than that.

DegustatoR said:
It'll just be even more expensive than 4090. There is no limit to the "top end" segment, we've already seen cards at $3000 MSRP there before.
That being said and as I've said above the margins there are supposedly still allow for some improvements at the same price - in contrast to what we see below ~$600.

Remains to be seen how much of a change to Lovelace gaming Blackwell will end up being. It's not like current Nvidia GPU architecture is struggling in anything.
Also the last time Nvidia switched architectures we've got Turning which was anything but an improvement in either. This time we are getting a process improvement at least (Turing didn't) so we'll see.

Maxwell was a big improvement over Kepler on the same node. Blackwell looks like a die shrink + big architectural change for some dies.

DegustatoR · Mar 14, 2024

Dangerman said:
I think GB203 will be N3 but maybe GB205, 206 & 207 be N4/SF4X?

Highly unlikely.

Dangerman said:
Overall IDK what the SKUs will be because with a 512-bit bus GB202 I do wonder if they can harvest enough 320 bit and 384 bit dies to make 5080s & 5080 Tis with 20 & 24 GB of VRAM?

Sure if you're ready to buy the 320 bit one at $1800 or something.

Dangerman said:
Because I can't see the market acceting another 80 class 16GB card, even if raster of the full die matches a 4090 or even that canned 4090 Ti and RT/PT faster than that.

There is no "80 class", perf/$ is the only metric which matters, and putting >16GBs on a card with ~4090 performance is just making this metric worse for no real reason.
I also do wonder if the 3GB G7 chips will cost as much as 4GB ones making them useless at least at the start. If not then it's possible that you'll see them used on the 256 bit product this time.

Dangerman said:
Maxwell was a big improvement over Kepler on the same node.

Sure but that was pretty much the only time it happened.

Dangerman said:
Blackwell looks like a die shrink + big architectural change for some dies.

All Blackwell dies will use the same architecture and unless we know more I don't know how you could call it a "die shrink".

Frenetic Pony · Mar 14, 2024

Bondrewd said:
15% more speed with like 10-15% bump in chip level density at 20-30% wafer price bump?
ughhhhhh. eh. definitely not value!
Passable for DC parts, stings everywhere else.

GB202 will be 30-40% more SMs no matter how they chop it.
Smaller ones can not afford such luxuries.
GPUs intrinsically rely on cost per mm^2 yielded being the same. It's going up.

It's so expensive :no:

Carbon nanotubes and other stuff aren't till next decade, from silicon to CNfet is a big leap. Then to graphene will be a huge leap, put a 4090 in a pair of smartglasses. But until then we get chiplets and packaging; I dunno maybe curvilinear masks will be cool?

So I guess for Blackwell Chiplets are in as a very early version for AI, but not for graphics. Wondering how that 512bit bus version will cost then, eugh.

Dangerman · Mar 14, 2024

Frenetic Pony said:
So I guess for Blackwell Chiplets are in as a very early version for AI, but not for graphics. Wondering how that 512bit bus version will cost then, eugh.

I suppose how big the die will be. 512-bit bus with a 128MB L2 Cache sounds huge.

Albuquerque · Mar 14, 2024

Just to state the obvious: a 64-byte memory bus is going to need a LOT of traces for communication. The bigger die is at least partially an artifact of needing physical surface area to mate all those traces.

Still, I do agree it's sounding like a big MFer.

T2098 · Mar 14, 2024

Albuquerque said:
Just to state the obvious: a 64-byte memory bus is going to need a LOT of traces for communication. The bigger die is at least partially an artifact of needing physical surface area to mate all those traces.

Still, I do agree it's sounding like a big MFer.

There's some benefits to going with an older process for this exact reason as well - you get the floorplan room to add some more GDDRx controllers, as well as likely some cost savings on buying the actual memory chips, which may be enough to offset the increase in PCB costs. The additional PCB area probably wouldn't cost too much, although I'm not entirely sure where the breakpoint is for GDDR trace routing to require extra PCB layers, which definitely would cost some.

While the data for GDDR6 at DRAMExchange is locked behind a paywall, generally higher density DRAMs cost more per bit as there's more demand for them.

Going from 128-bit to 256-bit memory bus on a theoretical midrange part made on an older process would also likely let you save on SRAM for the last level cache.
That also probably won't do you any favors power-wise, although with all that 'real' memory bandwidth to burn, you could probably run all the GDDRx chips way down in their 'happy' place clocks/voltage wise, or even just stick with GDDR6 for some of the mid-lower end Blackwell parts.

Granath · Mar 18, 2024

Nvidia B100, B200, GB200 - COGS, Pricing, Margins, Ramp - Oberon, Umbriel, Miranda

The B Stands For Jensen's Benevolence

www.semianalysis.com

pharma · Mar 18, 2024

https://twitter.com/i/web/status/1769241748952223768

trinibwoy · Mar 18, 2024

Granath said:
Nvidia B100, B200, GB200 - COGS, Pricing, Margins, Ramp - Oberon, Umbriel, Miranda

The B Stands For Jensen's Benevolence

www.semianalysis.com

Blackwell still on 4N? Very interesting.

Arun · Mar 18, 2024

It makes a lot of sense if they're still going for retsized dies with a large amount of SRAM, N3E yields and SRAM density are probably not ready yet, and if they're back to yearly refreshes as rumoured then no one is likely to beat them to N3E mass production by any noticeable margin either. Still, it's a bit disappointing, would have pretty fun to see them go into N3E full mass production at the same time as Apple!

Looking at my power numbers for H100 vs AD102 and the voltage curves I'm seeing, I think 700W for 1600mm2 of 4nm is feasible, but they'll definitely have to be aggressive in terms of power efficiency focus at every level and still be power/thermals constrained in a bunch of cases (H100 already is!) - so it'll be interesting to see AI/HPC workload optimisation shift further from "optimise performance" to "optimise power" perhaps!

trinibwoy · Mar 18, 2024

H100 SXM is already at 700mm^2. Seems impossible to jam over 1400mm^2 of the same silicon into the same power envelope. What are the odds that the first Blackwell release is just a single B100 die?

Speculation and Rumors: Nvidia Blackwell ...

DegustatoR

trinibwoy

Meh

Bondrewd

DegustatoR

trinibwoy

Meh

Qesa

DegustatoR

vola

trinibwoy

Meh

Dangerman

DegustatoR

Frenetic Pony

Dangerman

Albuquerque

Red-headed step child

T2098

Granath

Nvidia B100, B200, GB200 - COGS, Pricing, Margins, Ramp - Oberon, Umbriel, Miranda

pharma

trinibwoy

Meh

Nvidia B100, B200, GB200 - COGS, Pricing, Margins, Ramp - Oberon, Umbriel, Miranda

Arun

Unknown.

trinibwoy

Meh