Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
I would be surprised if they push power limits up again when AMD will not be competing. 3nm should be enough for a 30-50% increase without increasing power, no?
They're still using 5nm family process(N4P) for B100/200, so it's probably safe to assume consumer stuff wont get 3nm, either.

They haven't. 30 and 40 series both top out at 450W.
Lovelace never got a 3090Ti equivalent part, though. The 3090 was only 350w and was actually less cut down than the 4090.

Really, there was little need for the 4090 to ever be 450w in the first place. It just led to the graphics cards having to be overengineered with higher prices to deal with Nvidia's specification. Same goes for most of Lovelace range, really. They almost all have like at least 50w higher TDP than they really should have.

Whether Nvidia raises power or not will probably not be based on need, so it's hard to predict.
 
Lovelace never got a 3090Ti equivalent part, though.
Because there was no competition. Same will happen with 5090.
Considering that RDNA5 should come some sizeable time after 5090 but likely prior to whatever comes to GeForces after Blackwell there are valid reasons why they would limit 5090's TDP to have the option of launching a 5090Ti later to counter RDNA5. And there are no reasons why they would increase it - unless it's the only way for a 5090 to outperform 4090 of course but that seems very unlikely.

Kopite seems to imply that 5090 is something other than a pure monolithic die.
It's a re-thread of GA100 but now for graphics - a split chip logically made as a monolithic one.
They probably want to iron out all issues which may arise prior to actually going with a multichip design.
 
Because there was no competition. Same will happen with 5090.
Considering that RDNA5 should come some sizeable time after 5090 but likely prior to whatever comes to GeForces after Blackwell there are valid reasons why they would limit 5090's TDP to have the option of launching a 5090Ti later to counter RDNA5. And there are no reasons why they would increase it - unless it's the only way for a 5090 to outperform 4090 of course but that seems very unlikely.


It's a re-thread of GA100 but now for graphics - a split chip logically made as a monolithic one.
They probably want to iron out all issues which may arise prior to actually going with a multichip design.
There was no competitive need whatsoever for the 4090 to be 450w, but they did it anyways.
 
It's a re-thread of GA100 but now for graphics - a split chip logically made as a monolithic one.
They probably want to iron out all issues which may arise prior to actually going with a multichip design.

Kopite is all over the place on this one. GA100 is a monolithic die. We’ve had logical splits before and never made a big deal out of it. The fact that there are multiple L2 partitions isn’t particularly unique. Current dies could also be considered “split” into multiple GPCs if we want to get technical.

Physically monolithic but logically split can describe lots of chips. What we want is logically monolithic.
 
Kopite is all over the place on this one. GA100 is a monolithic die. We’ve had logical splits before and never made a big deal out of it. The fact that there are multiple L2 partitions isn’t particularly unique. Current dies could also be considered “split” into multiple GPCs if we want to get technical.

Physically monolithic but logically split can describe lots of chips. What we want is logically monolithic.
It's split in h/w but is a monolithic die. "Logically" here doesn't mean from s/w POV, it means from h/w design POV.
GPCs can't function on their own without global scheduling or caches/MCs. It is a step to being able to do multichip systems for sure but only an early one.
 
"Logically" here doesn't mean from s/w POV, it means from h/w design POV.

What does “logically from a h/w POV” mean exactly? There are already tons of things that are separated at a hardware level but are presented as a unified capability to software.

I think the relevant questions are whether it uses multiple dies and whether those individual dies are visible to software. If it’s monolithic with distributed compute and/or cache it will be similar to existing chips.
 
What does “logically from a h/w POV” mean exactly? There are already tons of things that are separated at a hardware level but are presented as a unified capability to software.

I think the relevant questions are whether it uses multiple dies and whether those individual dies are visible to software. If it’s monolithic with distributed compute and/or cache it will be similar to existing chips.
this one: ??

1716886301955.png
 
What does “logically from a h/w POV” mean exactly?
Means that they could make it as a multichip system but choose not to at the moment.

I think the relevant questions are whether it uses multiple dies and whether those individual dies are visible to software.
GB202 doesn't. Individual dies won't be visible to s/w (well, to applications at least; drivers is another thing) when there will be such system.
 

According to Nvidia each L2 partition in A100/H100 serves half of the GPCs and write coherency across partitions is handled in hardware. Similar to CPU L2s. That’s a far cry from a multi-chip setup where you also need to solve for I/O, host communication and work distribution.

Chips and cheese claims all SMs can read from both partitions with higher latency to the “far” partition. However this isn’t consistent with Nvidia’s description.

“Each L2 partition localizes and caches data for memory accesses from SMs in the GPCs directly connected to the partition. Hardware cache-coherence maintains the CUDA programming model across the full GPU”
 
Means that they could make it as a multichip system but choose not to at the moment.

Maybe but kopite’s posts don’t give any insight into that one way or the other. He says it’s logically 2xGB203 which really tells us nothing. Physically 2xGB203 would be interesting.
 
According to Nvidia each L2 partition in A100/H100 serves half of the GPCs and write coherency across partitions is handled in hardware. Similar to CPU L2s. That’s a far cry from a multi-chip setup where you also need to solve for I/O, host communication and work distribution.

Chips and cheese claims all SMs can read from both partitions with higher latency to the “far” partition. However this isn’t consistent with Nvidia’s description.

“Each L2 partition localizes and caches data for memory accesses from SMs in the GPCs directly connected to the partition. Hardware cache-coherence maintains the CUDA programming model across the full GPU”
I don't think they're necessarily contradictory. If it exists on the far L2$ it certainly makes sense to look there before DRAM, and reading it could cause the line to be moved/copied to the local L2$. Possibly with some cleverer logic than single read = migrate
 
And why would you duplicate the display, video, PCIE stuff, etc. It's a non-trivial amount of silicon and certainly not cheap.
 
“MCM logically like GA100 and GH100” is nonsense. Those chips aren’t MCM in any way.
Yeah indeed i'm begining to fathom how needlesly restrictive it would be to create a physical layout for the chip such that it is mirrored and that one half communicates with the other via some sort of (pardon language abuse) "discrete physical" bus/link
 
And why would you duplicate the display, video, PCIE stuff, etc. It's a non-trivial amount of silicon and certainly not cheap.
Well that's why he's saying 'logically', so I think he's just meaning just the compute.

Which means it's really nothing like MCM and nothing new. Plenty of GPU's have had mirrored compute clusters.
 
Back
Top