Discussion in 'Architecture and Products' started by DSC, Mar 19, 2013.
For better volume rendering you also can try:
One aspect it seems none have made a comment about is how the TDP-TBP has increased for the V100 from 300W to 350W with the SXM3.
The HBM2 memory increase is only partially to explain for the increase.
Oh wow, that's a big deal.
So the DGX-2 moved from SXM2 (and a 300W power limit) to SXM3 (and a 350W power limit)? I missed that.
On this DGX-2 product page, I can see SXM3 mentioned, but do you know of a source that explicitly talks about the 350W power limit?
EDIT - I can't for the life of me find substantial information about the upgrade to 350W. How is no one talking about this?
The best I found was this obscure site.
They're saying 350 watts too, no source cited though.
A few more sources publicly around as well, did a search looking for it myself to see if reported.
Worth noting some at NextPlatform (Paul and also Tim) reporting this are not just journalists but also tech analysts, just pointing out because analysts do get invited to more briefings or information generally and easier than usual tech publications; talking from experience where more effort (different type effort to putting on large press presentation) was made to engage with analysts/high profile core clients than press.
Some may see it as semantics but it is subtly different in terms of engagement for quite a few companies, that said some journalists are highly regarded and fair to say they can get good engagement generally.
So Titan V and V100 PCI-E are rated for 14 TF and 250w, while the V100 SXM2 is rated for 15.7 TF and 300w. And now we have a V100 SXM3 rated for 350w with how many TF? 18? Or Is it just for the extra 16GB HBM2?
Considering that power usage grows faster than performance and more HBM2 probably using more power, it should be max 17TF, if even that
If this isn't somehow lumping in the power consumption of the switch chips, there could be extra power consumption from the presumably higher IO utilization of having full link bandwidth to any other GPU without becoming bottlenecked or blocked like in the prior topology.
NVSwitch chips are separate (outside of the SXM3) and with their own heatsink, each of the switches are somewhere between 50W to 100W.
Basically they all connect through the NVLink baseboard.
I'm leaving open the possibility that there was a misunderstanding of the per-module consumption within the DGX-2 system, in the absence of a cited Nvidia statement. That aside, the SXM2 modules used in the prior product only used 4 links for connectivity versus the 6 for DGX-2, which can contribute to higher power consumption.
The nextplatform article tries to estimate that the 32 GB of HBM in an SXM3 takes the memory power consumption from 50 to 100 W. The 50W for 16GB may be in the ballpark (I've seen estimates for Vega's HBM at 20-30W), depending on whether Nvidia is still using HBM2 that's been volted above the spec's 1.2V. Crediting the doubling capacity to 32GB for another 50W seems dubious. Capacity alone has a much lower contribution versus the actual access costs, which makes me uncertain there haven't been other misunderstandings.
Of note, Nvidia's SXM2 spec has 16 and 32 GB capacities listed under the same 300W TDP.
Those switches also work out to ~75W per GPU (6 switches @ 100W to 8 GPUs). A standard ~275W card consumption per GPU with additional <75W off chip as an IO buffer (NVSwitch) would give the ~350W peak figure. They probably lumped in the network cost per GPU. It's also not inconceivable they are powered inline with the 75W PCIe connector spec. Move the drivers off the GPU onto the switch chips with tighter control if only having to drive a signal a matter of centimeters and not meters.
Maybe it could be a misunderstanding between Nvidia and a few people.
I can safely say the NVSwitch is definitely being treated as separate, like I said a few others have also been told 350W by Nvida at GTC.
Yeah he speculates what the increase relates to and seems his figures are rather excessive purely on capacity, and like you seen esitmates it is up to 35W but that involves HBM2 memory that is clocked higher, like I said the memory 16GB increase is only partially the reason and such increases are usually covered in the TDP without changes as it is pretty easy to accomodate this with subtle other envelope tweaks; this has been done in the past when memory has doubled without change to TDP but that did not involve anything else changing with regards to memory spec.
Did Nvidia officially release the HBM2 memory clock rates for SXM3 yet?
Maybe they have increased them to the manufacturer spec.
SXM2 is 300W not 275W though.
He is talking about the SXM3 explicitely, in no way is the NVSwitch part of that and he talks about NVSwitch TDP separately, it is also highly dynamic and one could not average its peak TDP down.
Just to say SXM2 V100 has 6 bricks-links (NVLink2), was the original/SXM2 Pascal with only 4.
We don't know whether it really needs 350 W TDP. Maybe It's like 10-20W more with double the HBM2 and nvidia don't want to go to 320W, but give it a bit headroom for next gen and therefore went for 350W. Who knows how Ampere will look like? As HBM3 doesn't seem ready it might have 6 HBM2 stacks and a higher TDP. If vendors already have SXM3 systems prepared, they could maybe just plug in ampere next year, instead of designing a new 350W cooling solution.
The bandwidth is still 900 GB/s, from the GTC 2018 keynote (14.4 TB/s across 16 GPUs).
That is the NVSwitch rather than the HBM2 memory; 18 ports at 50GB/s gives the 900 GB/s.
It does not need to match the HBM bandwidth as NVLink allows bricks-ports to be aggregated in any way needed and do not need to match-balance even on a GPU (can be quite asymmetrical involving individual and dual upwards concurrently with SXM2 V100).
Then where did the 14.4 TB/s in the keynote come from? (Because that's how I calculated 900 GB/s.)
LOL yeah good point and sorry (was referencing other data while reading your post), I had only seen it referenced before in terms of the NVSwitch.
He says the 14.4TB/s in aggregate BW context across the switches in 1 second (at 56minutes 30secs) which is not possible; Blurring the boundary between GPUs/semantic fabric and the NVswitches/NVLink.
But yeah that would suggest it is still 900GB/s HBM2 on the SXM3, unless for the presentation they just went with V100 spec where memory has also increased (DGX1-V) *shrug*;  some of the spec still seems vague for now with what a couple of the Elite partners have.
Worth noting that the DGX-2 and SXM3 does not necessarily reflect V100 generally apart from all having 32GB.
Just saying in case some think this may also reflect on TDP/specs for existing models, DGX-1V (SXM2), V100 PCIe,etc.