Nvidia Pascal Announcement

Ailuros · Apr 7, 2016

Voxilla said:
We already know yield is pretty low given 4 defects are taken into account with GP100.

I don't recall one big chip from the recent past where NV managed to have all clusters enabled from the first production run. In fact given that P100 has 4 clusters out of 60 disabled (which equals 10%). it's even an achievement considering that with Kepler/GK110 it was rather 20% disabled clusters or less at its kickstart under 28HP.

Even with a die half the size still 2 SMs would need to be disabled. So going large doesn't seem very economical.

Who said anything about going large? I asked a question why keep the same strategy as with the HPC oriented P100 and not go for a higher transistor density with lower frequencies you're predicting. For P100 they increased compared to GM200 transistor density by 86% and invested the rest the process could offer for such a chip in frequency.

And no I don't see why you couldn't get also full GP104 parts considering they won't arrive before June and wafer and binning yields are typically diametrically better for smaller chips. Most likely full parts at an obnoxiously high MSRP and salvage parts with a more reasonable MSRP, hopefully with full high speed memory this time *cough*

Voxilla · Apr 7, 2016

Ailuros said:
I don't recall one big chip from the recent past where NV managed to have all clusters enabled from the first production run. In fact given that P100 has 4 clusters out of 60 disabled (which equals 10%). it's even an achievement considering that with Kepler/GK110 it was rather 20% disabled clusters or less at its kickstart under 28HP.

GP100 disables 4 SMs GK110 disables 2 SMs. I dont see why % would matter. Having many smalller SMs (like the GP100 has) matters as you can catch the same, or even more defects, without disabling large parts of your GPU.

Ailuros said:
Who said anything about going large? I asked a question why keep the same strategy as with the HPC oriented P100 and not go for a higher transistor density with lower frequencies you're predicting. For P100 they increased compared to GM200 transistor density by 86% and invested the rest the process could offer for such a chip in frequency.

I don't quite get you there. How can they go with an even higher transistor density. Obviously it will be 16nm, not 28nm.

Ailuros · Apr 7, 2016

Voxilla said:
GP100 disables 4 SMs GK110 disables 2 SMs. I dont see why % would matter. Having many smalller SMs (like the GP100 has) matters as you can catch the same, or even more defects, without disabling large parts of your GPU.

Of course does cluster size matter and of course do persentages also matter. On GK110 they had to initially disable a larger die area in analogy then they had to for P100. It might not tell us anything about wafer yields, but it's at least a good indication for possible healthier binning yields for P100.

I don't quite get you there. How can they go with an even higher transistor density. Obviously it will be 16nm, not 28nm.

NV didn't take advantage of the full density increase 16FF+ allowed compared to 28nm. As I said it's an 86% increase mixed up with a >30% frequency increase compared to desktop GM200 (if I'd compare the initial P100 core frequency with the initial K40 core frequencies it'll get silly.....). Not to forget that the TDP is this time at 300W.

What I was asking is why should they go for as high frequencies as you're suggesting and not go for more modest ones with a 100% density increase compared to their 28HP chips. It's not some sort of trick question I just don't see an HPC oriented chip with a 300W TDP being a good indicator for what they might have done with smaller chips. Should I also cut that TDP in half for 150 or even more Watt? If yes I'd guess that Polaris could have a joyride.

Razor1 · Apr 7, 2016

silent_guy said:
What?

errors in silicon tend to show up more as the chip size increases, its not a equal distribution of errors across a silicon piece.

MDolenc · Apr 7, 2016

Razor1 said:
errors in silicon tend to show up more as the chip size increases, its not a equal distribution of errors across a silicon piece.

I think you're mixing up chip defects and errors being more likely at the edges of a wafer. This doesn't mean chips are more likely to have defects at their edges. It means that chips further out from center of the wafer are more likely to have (more) defects.

Razor1 · Apr 7, 2016

ah ok.

silent_guy · Apr 7, 2016

A different transistor density for the same process would require a different fill factor of the standard cells per area (no benefit), a different standard cell library (unlikely), a different memory cell library (unlikely), or a different ratio between standard cells and memory area.

But since compute oriented chips have traditionally used more memory than graphics chips (larger caches, larger register files), that would actually decrease transistor density for the graphics chips.

So I don't expect any increase in density at all.

xpea · Apr 7, 2016

got an interesting news today from an old friend. Google is going to be "by far" the biggest Pascal customer, followed by.... Baidu ! and GP100 allocation is full for the next 6 months, DGX1 is a big hit, they cannot build enough for the demand...

CSI PC · Apr 7, 2016

A1xLLcqAgt0qc2RyMz0y said:
Those are too be build using Volta's not Pascal's.

http://www.anandtech.com/show/8727/nvidia-ibm-supercomputers

Just to correct my previous post.
Yeah still looks like Volta is on track for 2017 and the exascale projects with IBM.
Just looked at very recent slide (IBM-NVIDIA collaboration Landscape) showing 2017 with Cuda9-OpenMP 4.x, Enhanched NVLink, and GV100 - which I assume is Volta.

Cheers

CarstenS · Apr 7, 2016

Would NVLink be comparable to PCIe or memory PHYs in scaling to new process nodes (i.e. not being able to achieve as high a density like logic etc.?

Razor1 · Apr 7, 2016

xpea said:
got an interesting news today from an old friend. Google is going to be "by far" the biggest Pascal customer, followed by.... Baidu ! and GP100 allocation is full for the next 6 months, DGX1 is a big hit, they cannot build enough for the demand...

that actually makes sense that google is really interested in this.....

Adored · Apr 7, 2016

Yes it's right up Google's street really, in more ways than one.

silent_guy · Apr 7, 2016

CarstenS said:
Would NVLink be comparable to PCIe or memory PHYs in scaling to new process nodes (i.e. not being able to achieve as high a density like logic etc.?

It's probably primarily limited by PCB characteristics, not process speed.

CarstenS · Apr 7, 2016

Sorry, I did not mean speed, but rather area. Analogue circuitry - AFAIR - would not pack as densely and thus not scale well with smaller process geometry.

IOW: Would the 4 NVLinks in GP100 take up a significant portion of the die.

xpea · Apr 7, 2016

CarstenS said:
Sorry, I did not mean speed, but rather area. Analogue circuitry - AFAIR - would not pack as densely and thus not scale well with smaller process geometry.

IOW: Would the 4 NVLinks in GP100 take up a significant portion of the die.

I have no idea of the die area but on P100 mezzanine connector, 4 Nvlink take 400 pins, out of 800 total (400 for PCIe and power). It's huge...

spworley · Apr 7, 2016

The fact that GP100 has 4 SMs disabled is in no way an indication of process defect rate. If the new mezzanine form factor has a power and thermal limit of 300 Watts, then a fully enabled chip at full frequency would exceed those limits. Having 4 spare SMs gives NVidia risk management both for defects and for power/performance tuning. Some SMs might be fully functional but burn a little more wattage, and/or reliably clock higher than others. 56 SMs at 1328Mhz is likely better than 60 SMs at 1200Mhz, for example. You can then cherrypick the SMs and set frequencies that give you the best compute performance within your power/TDP range. Requiring the use of all 60 SMs loses that flexible tuning opportunity.

That said, we don't know the new form factor's power or thermal limits. 300 Watts is the traditional max for dual-width 6+8 pin PCIe cards, but NVIdia could have chosen a higher bound for their custom form factor. But it's likely similar since a higher wattage would be fine for servers, but prevent the same chip from being used in PCIe card form factor designs.

silent_guy · Apr 7, 2016

spworley said:
The fact that GP100 has 4 SMs disabled is in no way an indication of process defect rate. If the new mezzanine form factor has a power and thermal limit of 300 Watts, then a fully enabled chip at full frequency would exceed those limits. Having 4 spare SMs gives NVidia risk management both for defects and for power/performance tuning. Some SMs might be fully functional but burn a little more wattage, and/or reliably clock higher than others. 56 SMs at 1328Mhz is likely better than 60 SMs at 1200Mhz, for example. You can then cherrypick the SMs and set frequencies that give you the best compute performance within your power/TDP range. Requiring the use of all 60 SMs loses that flexible tuning opportunity.

Excellent points, and I agree completely.

Furthermore, 4 out of 60 is just a 6% reduction in compute units. Even if they don't do it to increase clock speeds or to select the optimal power configuration, this reduction doesn't change the competitive position of this product in any way: there is nothing to challenge it. Selling a version now with all units disabled wouldn't make any sense.

To me, the fact that it's only 4 units is a very strong indication that 16nm is doing just fine.

Adored · Apr 8, 2016

silent_guy said:
Excellent points, and I agree completely.

Furthermore, 4 out of 60 is just a 6% reduction in compute units. Even if they don't do it to increase clock speeds or to select the optimal power configuration, this reduction doesn't change the competitive position of this product in any way: there is nothing to challenge it. Selling a version now with all units disabled wouldn't make any sense.

To me, the fact that it's only 4 units is a very strong indication that 16nm is doing just fine.

If they have no competition then wouldn't they be better served by disabling even more units and having lower clocks at 250W? That would give them better yields now and would leave them with a bigger, more enticing upgrade later with 32GB, higher clocks and a full die. There must be some kind of competition or they wouldn't push it at all surely?

Grall · Apr 8, 2016

xpea said:
4 Nvlink take 400 pins, out of 800 total (400 for PCIe and power).

You don't think that power (and ground) is supplied through the mounting screw points? Apple does that in the Mac Pro. Mezzanine connector doesn't seem heavy-duty enough to supply 300W, especially if less than half the pins are available for power delivery...

spworley said:
If the new mezzanine form factor has a power and thermal limit of 300 Watts, then a fully enabled chip at full frequency would exceed those limits.

You could simply drop clocks and volts a little, get a correspondingly larger drop in power than the additional draw from the extra enabled functional units (along the lines of the fury nano.)

Bob · Apr 8, 2016

silent_guy said:
Selling a version now with all units disabled wouldn't make any sense.

I agree; that would not be a very competitive product.

Nvidia Pascal Announcement

Ailuros

Epsilon plus three

Voxilla

Ailuros

Epsilon plus three

Razor1

MDolenc

Razor1

silent_guy

xpea

CSI PC

CarstenS

Moderator

Razor1

Adored

silent_guy

CarstenS

Moderator

xpea

spworley

silent_guy

Adored

Grall

Invisible Member

Bob

Similar threads