Speculation and Rumors: Nvidia Blackwell ...

Dangerman · Mar 18, 2024

They can refresh with B200 on 3nm anyways which is next year apparently.

troyan · Mar 18, 2024

Reddit has another slide:

semianlaysis-nvidia-b100-b200-gb200-cogs-pricing-margins-v0-K8B2q6_pnryc2FYalKJalHshWvinNDdbHTwt2V576Ho.jpg

1.8 TB/s offchip bandwidth. Four years ago A100 supported 2.02TB/s with HBM2. Be able to connect 72 B100s with 14 TB memory is just out of this world.

DegustatoR · Mar 18, 2024

Dangerman said:
They can refresh with B200 on 3nm anyways which is next year apparently.

Not according to this:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6def54d4-6b58-470c-bb13-aa95add9255f_3006x1576.jpeg

Bondrewd · Mar 18, 2024

trinibwoy said:
Seems impossible to jam over 1400mm^2 of the same silicon into the same power envelope

MI200 already did that just fine (if you remember that the OAM MI100 was shitcanned for reasons).
Clock it lower and that's all.

trinibwoy · Mar 18, 2024

Bondrewd said:
MI200 already did that just fine (if you remember that the OAM MI100 was shitcanned for reasons).
Clock it lower and that's all.

Still got to perform at those lower clocks though. Perf/w will be the key thing to watch.

vola · Mar 18, 2024

Dangerman said:
They can refresh with B200 on 3nm anyways which is next year apparently.

B200 seems to be the same chip as B100 but with 1000W TDP instead of 700W.

Bondrewd · Mar 18, 2024

trinibwoy said:
Still got to perform at those lower clocks though. Perf/w will be the key thing to watch.

Yea, that's the trick with spamming twice the Si.

vola said:
B200 seems to be the same chip as B100 but with 1000W TDP instead of 700W.

They're both kilowatt, no worries.
700W B100 is meme/10.

DavidGraham · Mar 18, 2024

An important information, seems NV was secretly doing compute chiplets years ago (since A100).

https://twitter.com/x/status/1769852326570037424

Seems gaming Blackwell is going the same route.

https://twitter.com/x/status/1769853437662126267

troyan · Mar 18, 2024

I thought everyone knows this because nVidia has written and spoke about it four years ago...

I remember that the interconnect speed in GA100 was 7.2 TB/s.

DegustatoR · Mar 18, 2024

troyan said:
I thought everyone knows this because nVidia has written and spoke about it four years ago...

Impossible. Everyone knows that Nvidia can't do chiplets.

DavidGraham · Mar 18, 2024

This might explain why B202 is rumored to be 512 bit, it's dies connected together, appears as one huge gaming die. It explains why it's called B202 and not B102.

DegustatoR · Mar 18, 2024

DavidGraham said:
This might explain why B202 is rumored to be 512 bit, it's dies connected together, appears as one huge gaming die. It explains why it's called B202 and not B102.

It's GB202 and they all are GB20x since they are "Blackwell 2.0" whatever that means in comparison to "1.0". Probably that they are on N3.

techuse · Mar 19, 2024

Is Blackwell just a wider Hopper or have there been architectural changes?

Bondrewd · Mar 19, 2024

techuse said:
Is Blackwell just a wider Hopper or have there been architectural changes?

The latter.

xpea · Mar 19, 2024

New DGX GB200 NVL72 SuperPod:

Frenetic Pony · Mar 19, 2024

The interlink bandwidth sounds incredible, but otherwise "two H100s stapled together (in terms of compute resources) but without double the HBM" isn't exactly what I was expecting out of Nvidia.

Deleted member 2197 · Mar 19, 2024

NVIDIA Blackwell GB202 Gaming GPUs To Utilize TSMC 4NP Node, Significant Improvement To Cache & SM Throughput

NVIDIA announced its Blackwell GPUs for AI and now eyes are all set on its gaming parts which are rumored to feature the same TSMC 4NP node.

wccftech.com

It was previously expected that NVIDIA was going to leverage the TSMC 3nm process node for the gaming chip but that plan has seemingly changed as Kopite7kimi now states both Blackwell AI Tensor Core and Gaming GPUs to be fabricated on a very similar process node. Just a few hours ago, we came to know that NVIDIA will be using TSMC's 4NP node, a variation of the 5nm node that was already used for Ada Lovelace and Hopper GPUs.

It is stated that the new process node will allow a 30% increase in transistor density which can lead to higher performance gains but the actual efficiency advantages are yet to be explained. TSMC doesn't explicitly state the 4NP process node anywhere on its webpage. They only mention N4P & which is also mentioned as an extension of the N5 platform with an 11% performance boost over N5 and a 6% boost over N4.
...
He also mentions that the GB203 GPU, the next in the Blackwell Gaming GPU lineup, will be half of the GB202, similar to AD102 and AD103 GPUs. This will lead to a huge disparity in performance if NVIDIA equips the next 90-series cards with GB202 and the 80-series cards with GB203. The biggest question is whether NVIDIA will utilize MCM (Multi-Chip-Module) packaging for its Blackwell Gaming GPUs or keep them monolithic for now. Given the increasing costs and yield issues associated with GPU/chip development, the chiplet route is indeed the way of the future & AMD's Radeon division has already embraced it.

troyan · Mar 19, 2024

Frenetic Pony said:
The interlink bandwidth sounds incredible, but otherwise "two H100s stapled together (in terms of compute resources) but without double the HBM" isn't exactly what I was expecting out of Nvidia.

B100 doubled the HBM3 memory from H100 and has 2.4x more bandwidth. It is basically a H100 NVL with 8TB/s instead of 600GB/s (NVLink).

Deleted member 2197 · Mar 19, 2024

NVLink & NVSwitch for Advanced Multi-GPU Communication

The Most Powerful End-to-End AI and HPC Data Center Platform.

www.nvidia.com

NVLink Switch Generational comparison

	First Generation	Second Generation	Third Generation	NVLink Switch
Number of GPUs with direct connection within a NVLink domain	Up to 8	Up to 8	Up to 8	Up to 576
NVSwitch GPU-to-GPU bandwidth	300GB/s	600GB/s	900GB/s	1,800GB/s
Total aggregate bandwidth	2.4TB/s	4.8TB/s	7.2TB/s	1PB/s
Supported NVIDIA architectures	NVIDIA Volta™ architecture	NVIDIA Ampere architecture	NVIDIA Hopper™ architecture	NVIDIA Blackwell architecture

Preliminary specifications; may be subject to change.

NVLink Generational comparison

	Second Generation	Third Generation	Fourth Generation	Fifth Generation
NVLink bandwidth per GPU	300GB/s	600GB/s	900GB/s	1,800GB/s
Maximum Number of Links per GPU	6	12	18	18
Supported NVIDIA Architectures	NVIDIA Volta™ architecture	NVIDIA Ampere architecture	NVIDIA Hopper™ architecture	NVIDIA Blackwell architecture

Preliminary specifications; may be subject to change.

Seanspeed · Mar 19, 2024

Frenetic Pony said:
The interlink bandwidth sounds incredible, but otherwise "two H100s stapled together (in terms of compute resources) but without double the HBM" isn't exactly what I was expecting out of Nvidia.

It's honestly super un-exciting. Per processor, the improvements really aren't that big at all. They must be quite assured in their current lead given this is supposed to be their new flagship for the next two years.

This new era of 'more performance by using more silicon' is gonna kinda suck.

Speculation and Rumors: Nvidia Blackwell ...

Dangerman

troyan

DegustatoR

Bondrewd

trinibwoy

Meh

vola

Bondrewd

DavidGraham

troyan

DegustatoR

DavidGraham

DegustatoR

techuse

Bondrewd

xpea

Frenetic Pony

Deleted member 2197

Guest

NVIDIA Blackwell GB202 Gaming GPUs To Utilize TSMC 4NP Node, Significant Improvement To Cache & SM Throughput

troyan

Deleted member 2197

Guest

NVLink & NVSwitch for Advanced Multi-GPU Communication

Seanspeed