Speculation and Rumors: Nvidia Blackwell ...

Seanspeed · May 25, 2024

techuse said:
I would be surprised if they push power limits up again when AMD will not be competing. 3nm should be enough for a 30-50% increase without increasing power, no?

They're still using 5nm family process(N4P) for B100/200, so it's probably safe to assume consumer stuff wont get 3nm, either.

DegustatoR said:
They haven't. 30 and 40 series both top out at 450W.

Lovelace never got a 3090Ti equivalent part, though. The 3090 was only 350w and was actually less cut down than the 4090.

Really, there was little need for the 4090 to ever be 450w in the first place. It just led to the graphics cards having to be overengineered with higher prices to deal with Nvidia's specification. Same goes for most of Lovelace range, really. They almost all have like at least 50w higher TDP than they really should have.

Whether Nvidia raises power or not will probably not be based on need, so it's hard to predict.

DavidGraham · May 27, 2024

Kopite seems to imply that 5090 is something other than a pure monolithic die.

https://twitter.com/x/status/1795028474479988802

https://twitter.com/x/status/1795031695730589948

DegustatoR · May 27, 2024

Seanspeed said:
Lovelace never got a 3090Ti equivalent part, though.

Because there was no competition. Same will happen with 5090.
Considering that RDNA5 should come some sizeable time after 5090 but likely prior to whatever comes to GeForces after Blackwell there are valid reasons why they would limit 5090's TDP to have the option of launching a 5090Ti later to counter RDNA5. And there are no reasons why they would increase it - unless it's the only way for a 5090 to outperform 4090 of course but that seems very unlikely.

DavidGraham said:
Kopite seems to imply that 5090 is something other than a pure monolithic die.

It's a re-thread of GA100 but now for graphics - a split chip logically made as a monolithic one.
They probably want to iron out all issues which may arise prior to actually going with a multichip design.

Seanspeed · May 27, 2024

DegustatoR said:
Because there was no competition. Same will happen with 5090.
Considering that RDNA5 should come some sizeable time after 5090 but likely prior to whatever comes to GeForces after Blackwell there are valid reasons why they would limit 5090's TDP to have the option of launching a 5090Ti later to counter RDNA5. And there are no reasons why they would increase it - unless it's the only way for a 5090 to outperform 4090 of course but that seems very unlikely.

It's a re-thread of GA100 but now for graphics - a split chip logically made as a monolithic one.
They probably want to iron out all issues which may arise prior to actually going with a multichip design.

There was no competitive need whatsoever for the 4090 to be 450w, but they did it anyways.

DegustatoR · May 27, 2024

Seanspeed said:
There was no competitive need whatsoever for the 4090 to be 450w, but they did it anyways.

Because it was the same max power level as in the previous generation.

trinibwoy · May 27, 2024

DegustatoR said:
It's a re-thread of GA100 but now for graphics - a split chip logically made as a monolithic one.
They probably want to iron out all issues which may arise prior to actually going with a multichip design.

Kopite is all over the place on this one. GA100 is a monolithic die. We’ve had logical splits before and never made a big deal out of it. The fact that there are multiple L2 partitions isn’t particularly unique. Current dies could also be considered “split” into multiple GPCs if we want to get technical.

Physically monolithic but logically split can describe lots of chips. What we want is logically monolithic.

DegustatoR · May 27, 2024

trinibwoy said:
Kopite is all over the place on this one. GA100 is a monolithic die. We’ve had logical splits before and never made a big deal out of it. The fact that there are multiple L2 partitions isn’t particularly unique. Current dies could also be considered “split” into multiple GPCs if we want to get technical.

Physically monolithic but logically split can describe lots of chips. What we want is logically monolithic.

It's split in h/w but is a monolithic die. "Logically" here doesn't mean from s/w POV, it means from h/w design POV.
GPCs can't function on their own without global scheduling or caches/MCs. It is a step to being able to do multichip systems for sure but only an early one.

orangpelupa · May 28, 2024

Any rumor for larger VRAM?

TopSpoiler · May 28, 2024

orangpelupa said:
Any rumor for larger VRAM?

GDDR7 will be manufactured starting from a minimum capacity of 16Gbits.

trinibwoy · May 28, 2024

DegustatoR said:
"Logically" here doesn't mean from s/w POV, it means from h/w design POV.

What does “logically from a h/w POV” mean exactly? There are already tons of things that are separated at a hardware level but are presented as a unified capability to software.

I think the relevant questions are whether it uses multiple dies and whether those individual dies are visible to software. If it’s monolithic with distributed compute and/or cache it will be similar to existing chips.

Granath · May 28, 2024

trinibwoy said:
What does “logically from a h/w POV” mean exactly? There are already tons of things that are separated at a hardware level but are presented as a unified capability to software.

I think the relevant questions are whether it uses multiple dies and whether those individual dies are visible to software. If it’s monolithic with distributed compute and/or cache it will be similar to existing chips.

this one: ??

x.com

DegustatoR · May 28, 2024

trinibwoy said:
What does “logically from a h/w POV” mean exactly?

Means that they could make it as a multichip system but choose not to at the moment.

trinibwoy said:
I think the relevant questions are whether it uses multiple dies and whether those individual dies are visible to software.

GB202 doesn't. Individual dies won't be visible to s/w (well, to applications at least; drivers is another thing) when there will be such system.

trinibwoy · May 28, 2024

Granath said:
this one: ??

x.com

x.com

View attachment 11392

According to Nvidia each L2 partition in A100/H100 serves half of the GPCs and write coherency across partitions is handled in hardware. Similar to CPU L2s. That’s a far cry from a multi-chip setup where you also need to solve for I/O, host communication and work distribution.

Chips and cheese claims all SMs can read from both partitions with higher latency to the “far” partition. However this isn’t consistent with Nvidia’s description.

“Each L2 partition localizes and caches data for memory accesses from SMs in the GPCs directly connected to the partition. Hardware cache-coherence maintains the CUDA programming model across the full GPU”

trinibwoy · May 28, 2024

DegustatoR said:
Means that they could make it as a multichip system but choose not to at the moment.

Maybe but kopite’s posts don’t give any insight into that one way or the other. He says it’s logically 2xGB203 which really tells us nothing. Physically 2xGB203 would be interesting.

Qesa · May 28, 2024

trinibwoy said:
According to Nvidia each L2 partition in A100/H100 serves half of the GPCs and write coherency across partitions is handled in hardware. Similar to CPU L2s. That’s a far cry from a multi-chip setup where you also need to solve for I/O, host communication and work distribution.

Chips and cheese claims all SMs can read from both partitions with higher latency to the “far” partition. However this isn’t consistent with Nvidia’s description.

“Each L2 partition localizes and caches data for memory accesses from SMs in the GPCs directly connected to the partition. Hardware cache-coherence maintains the CUDA programming model across the full GPU”

I don't think they're necessarily contradictory. If it exists on the far L2$ it certainly makes sense to look there before DRAM, and reading it could cause the line to be moved/copied to the local L2$. Possibly with some cleverer logic than single read = migrate

Man from Atlantis · May 29, 2024

Latest from kopite7kimi

x.com

trinibwoy · May 29, 2024

Dual-slot cooler would be surprising but nice.

“MCM logically like GA100 and GH100” is nonsense. Those chips aren’t MCM in any way.

Erinyes · May 29, 2024

And why would you duplicate the display, video, PCIE stuff, etc. It's a non-trivial amount of silicon and certainly not cheap.

entity279 · May 29, 2024

trinibwoy said:
“MCM logically like GA100 and GH100” is nonsense. Those chips aren’t MCM in any way.

Yeah indeed i'm begining to fathom how needlesly restrictive it would be to create a physical layout for the chip such that it is mirrored and that one half communicates with the other via some sort of (pardon language abuse) "discrete physical" bus/link

Seanspeed · May 29, 2024

Erinyes said:
And why would you duplicate the display, video, PCIE stuff, etc. It's a non-trivial amount of silicon and certainly not cheap.

Well that's why he's saying 'logically', so I think he's just meaning just the compute.

Which means it's really nothing like MCM and nothing new. Plenty of GPU's have had mirrored compute clusters.

Speculation and Rumors: Nvidia Blackwell ...

Seanspeed

DavidGraham

DegustatoR

Seanspeed

DegustatoR

trinibwoy

Meh

DegustatoR

orangpelupa

Elite Bug Hunter

TopSpoiler

trinibwoy

Meh

Granath

x.com

DegustatoR

trinibwoy

Meh

x.com

trinibwoy

Meh

Qesa

Man from Atlantis

x.com

x.com

x.com

trinibwoy

Meh

Erinyes

entity279

Seanspeed