Nvidia Pascal Announcement

You don't think that power (and ground) is supplied through the mounting screw points? Apple does that in the Mac Pro. Mezzanine connector doesn't seem heavy-duty enough to supply 300W, especially if less than half the pins are available for power delivery...


You could simply drop clocks and volts a little, get a correspondingly larger drop in power than the additional draw from the extra enabled functional units (along the lines of the fury nano.)

The Fury nano is a special bin of the Fiji GPU that's the product of low performance/low power silicon. The clock speeds of a normal Fury/X can't be hit with this silicon, but much lower ones can be at a lot less power. Considering it's so small AMD saw an opportunity to bin what might be a useless and limited set of their product as the Nano, a tiny high efficiency chip, that seems to have worked. That they charge so much is in part because this sort of bin is a relatively low percentage of produced and still viable chips, so even if Nvidia wanted to do the same with GP100 the amount available would be tiny.

And you can't normally expect the kind of efficiency gains from dropping frequency on most silicon, and this goes doubly for finfet. The exponential fmax curve also goes the other way, meaning the more you drop frequency the less power savings you get as an exponential curve.

And of course the GP100 Tesla stuff is going to be sold out, with such a huge chip on a new node you'd expect yields to be low (which is why they branded it as a Tesla for the first run, low volume, high margins). The competition from Nvidia itself and AMD is going to be an interesting thing to watch over the next fiscal year. Their new GPUs make their old ones out of date and so far less appealing, data centers spend a lot on cooling/power after all, so discounts on GPUs themselves are of limited value. At the same time if they can't get the volume of the new chips up Nvidia could end up with a lot less profit overall, at least for quarter or three.
 
Then you have no leeway to work around defects, and the required drop in frequency across the entire chip might result in a larger drop in performance than the one you get by losing 4 SMs.

As for competition - isn't this targeted at roughly the same place as Knight's Landing?
 
If they have no competition then wouldn't they be better served by disabling even more units and having lower clocks at 250W? That would give them better yields now and would leave them with a bigger, more enticing upgrade later with 32GB, higher clocks and a full die.
This is not a consumer market where customers upgrade from one SKU to the other on a whim. The customers that buy these kind of racks for the datacenter first put them through a bunch of tests to make sure that they work well in their specific environment (thermals, power delivery, ...).

If you can avoid it, you're not going to buy a 250W unit today and a 300W unit a year later. (The same is true on the Nvidia side as well, but only worse.)

So it makes sense to shoot high enough to avoid that and have a product with a long life time. This is no different than the existing Tesla class GPUs that have a long shelve life as well.

There must be some kind of competition or they wouldn't push it at all surely?
If they'd disabled all units, they'd have plenty of competition. Of course there's a sweet spot somewhere.
 
The fact that GP100 has 4 SMs disabled is in no way an indication of process defect rate. If the new mezzanine form factor has a power and thermal limit of 300 Watts, then a fully enabled chip at full frequency would exceed those limits. Having 4 spare SMs gives NVidia risk management both for defects and for power/performance tuning. Some SMs might be fully functional but burn a little more wattage, and/or reliably clock higher than others. 56 SMs at 1328Mhz is likely better than 60 SMs at 1200Mhz,

The GTX980 and the TitanX have both base clock 1002 and boost 1075 Mhz.
The former has 2 SMs disabled and the latter none.
The Tesla M40 has base 948 and boost 1114 Mhz and no SMs disabled.

In your reasoning the Tesla M40 would have been better off with less SMs enabled and higher clock, to optimize power.
Clearly this hasn't happened.
 
Last edited:
In your reasoning the Tesla M40 would have been better off with less SMs enabled and higher clock, to optimize power.
Clearly this hasn't happened.
Not clearly. M40 launched half a year after 980 Ti, when they had good picture of die quality. The Tesla GP100 is first revealed product of the chip.
 
The GTX980 and the TitanX have both base clock 1002 and boost 1075 Mhz.
The former has 2 SMs disabled and the latter none.
The Tesla M40 has base 948 and boost 1114 Mhz and no SMs disabled.

In your reasoning the Tesla M40 would have been better off with less SMs enabled and higher clock, to optimize power.
Clearly this hasn't happened.
spworley is simply pointing out that disabled SMs don't have to be an indication of process defect rate, because other explanations are possible as well. The real reason is probably a mix of multiple variables and only Nvidia knows the answer to which reason had more weight in the final conclusion.

As for your example above, it could simply have been the case that a full M40 had a sufficiently low power consumption to fit the use case requirements of its customers and that no further power optimization was necessary.

None of these things are black and white.
 
spworley is simply pointing out that disabled SMs don't have to be an indication of process defect rate, because other explanations are possible as well. The real reason is probably a mix of multiple variables and only Nvidia knows the answer to which reason had more weight in the final conclusion..

For a new process and a big die it's perfectly normal yield is lower and it will get better in a few years.
(yield being anything from non-functional to power/frequency related issues)
Disabling a few SMs to compensate for that is no big deal especially if you have many of them.
The first big Kepler was equally ambitious as the current Pascal, very big die and brand new process.
It took nearly 2 years until the yield improved and we got the GTX780Ti with all units enabled including higher frequency.
 
The first big Kepler was equally ambitious as the current Pascal, very big die and brand new process. It took nearly 2 years until the yield improved and we got the GTX780Ti with all units enabled including higher frequency.
Conveniently, that took place immediately after AMD launched their Hawaii GPU that threatened NV-performance leadership in the desktop-space. Coincidence or just carefully managed resources?
 
Conveniently, that took place immediately after AMD launched their Hawaii GPU that threatened NV-performance leadership in the desktop-space. Coincidence or just carefully managed resources?

Agreed they must have expected something was coming and they were prepared:
Reading back an article http://techreport.com/review/25611/nvidia-geforce-gtx-780-ti-graphics-card-reviewed

"When I asked Nvidia where it found the dark magic to achieve this feat, the answer was more complex than expected. For one thing, this card is based on a new revision of the GK110, the GK110B (or it is GK110b? GK110-B?). The primary benefit of the GK110B is higher yields, or more good chips per wafer. Nvidia quietly rolled out the GK110B back in August aboard GTX 780 and Titan cards, so it's not unique to the 780 Ti. Separate from any changes made to improve yields, the newer silicon also benefits from refinements to TSMC's 28-nm process made during the course of this year."
 
For a new process and a big die it's perfectly normal yield is lower and it will get better in a few years..
Yes, none of this is new. I'm sure yield was one of the many factors that Nvidia took into account when defining the product.
 
Yes, none of this is new. I'm sure yield was one of the many factors that Nvidia took into account when defining the product.
If you go and make the biggest chip the brand new process can support, you're not thinking yields.
 
If you go and make the biggest chip the brand new process can support, you're not thinking yields.
Yeah they had a business strategy; to replace the ageing Kepler Tesla cards and re-invigorate the HPC-exascale-research side of their market that has really been on hold for a little while now as Maxwell was primarily a consumer product.
Cheers
 
Product design and positioning have a many-variable space that the companies try to optimize for. While the relative priorities of each concern can shift based on what is decided, it seems trivially true that not thinking about yields at all leaves open the possibility that zero yields is somehow acceptable.

IBM's POWER line targets (among other things) very expensive systems, where the costs of the high-speed silicon and very complex package and system design were made up in massive service revenue. However, at times even then there were indications that even for the comparatively small volume and tolerance for cost that there was a non-zero threshold for yields, given spotty availability for the products it launched.
That IBM has spun off a significant portion of what went into manufacturing these processors shows that reality still needs to intrude even with several extra zeroes on the product price.
 
Yeah they had a business strategy; to replace the ageing Kepler Tesla cards and re-invigorate the HPC-exascale-research side of their market that has really been on hold for a little while now as Maxwell was primarily a consumer product.
Cheers

Just speaking from the perspective of someone on the HPC side of things, this is very true. This is just my personal opinion and experience, but Pascal is the first GPU product for a while now to make me excited on the HPC side. Other processors have had some great theoretical specs but the software stack hasn't been there to support them. NVIDIA's Maxwell offerings just weren't enough of an upgrade of Fermi for most of the machines I work with to even consider them on the HPC side. Pascal on the other hand has a lot of really exciting things. The increased double precision is a welcome increase for some of the problems I work on (CEM, multigrid and CGS solvers), and the increased memory capacity is really exciting (HPC nodes can have anywhere from 128GB to 768GB, while accelerators are generally stuck in the 12-16GB range - making memory transfers frequent and painful at times), plus it supports EEC so you don't have the overhead that existed on the K40. Probably most exciting for me is preemption though. To be honest, there is always someone who wants to run a scaled down version of the software for a few hours on their laptop to test their latest model. That means we always have to deal with windows TDR even when it is a non-issue on the big machines. If Windows and NVIDIA can get together and allow preemption to do that instead of the TDR that would remove a whole level of complexity from the software I write.

The DGX-1 is a little less exciting from my perspective because of the low memory per node - I'm hoping they release future versions with more. Overall though I would say Pascal is pretty exciting to me. Of course I am just a single developer and don't make hardware purchase decisions for any of the large clusters, so don't take this as an argument that everyone shares my opinion. But I can see Pascal generating quite a bit of excitement in the HPC market if it lives up to the promises made during the keynote address.
 
Just speaking from the perspective of someone on the HPC side of things, this is very true. This is just my personal opinion and experience, but Pascal is the first GPU product for a while now to make me excited on the HPC side. Other processors have had some great theoretical specs but the software stack hasn't been there to support them. NVIDIA's Maxwell offerings just weren't enough of an upgrade of Fermi for most of the machines I work with to even consider them on the HPC side.
I thought GM200 was quite a massive upgrade in most every way over GF100 for FP32 and FP16 applications. Hell it should even be a lot better than GK110 as well (except for FP64 obviously). Is that not the case?
 
~300mm² GPU with 8Gbps GDDR5: https://www.chiphell.com/thread-1563086-1-1.html
Its also said that GPU-Z detects some 1152/864SPs, through missing Pascal detection. So it could be 18/36 SMM/SM.

Maybe the GTX 1070: 2304SPs@>1,4GHz + 8GiB 8Gbps. While GTX 1080 goes for GDDR5X and full core with 2560SPs for the benchmarks with low availability because of memory and 16FF yields.

And GP106 seems also to be ready:

http://www.hardware.fr/news/14589/gtc-200-mm-petit-gpu-pascal.html ~205mm² size estimation
http://www.hardwareluxx.de/index.ph...it-pascal-gpu-und-samsung-gddr5-speicher.html A1 chips made in 13. week of 2016.
 
Last edited:
I thought GM200 was quite a massive upgrade in most every way over GF100 for FP32 and FP16 applications. Hell it should even be a lot better than GK110 as well (except for FP64 obviously). Is that not the case?

I checked to make sure and you were correct. The machines are currently using K40s, which are GK110s - not the original M2090s which were GF110s. The upgrade to M40s never happened because you were looking at a very small increase in SP (~5000GFLOPS to ~6500GFLOPS) and a rather significant loss in DP (~1500GFLOPS to ~200GFLOPS). While we try to do everything we can in SP, there are operations that require DP for accuracy over longer simulations/scenarios. Pascal if the rumored specs are true goes to double the SP and triple the DP of our current accelerators.

In other words, the upgrade to M40s was actually a step back for us. The upgrade to P100s would be a significant step forward.
 
In other words, the upgrade to M40s was actually a step back for us. The upgrade to P100s would be a significant step forward.
A step back because of the drop in DP rate? Just trying to be clear :)

Also I know the increase in SP rate for GM200 vs GK110 isn't that much on paper, but in real world applications GM200 should have a greater advantage than what the theoreticals suggest and be more power efficient to boot.
 
Last edited:
Back
Top