Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,463
    Likes Received:
    187
    Location:
    Chania
    I don't recall one big chip from the recent past where NV managed to have all clusters enabled from the first production run. In fact given that P100 has 4 clusters out of 60 disabled (which equals 10%). it's even an achievement considering that with Kepler/GK110 it was rather 20% disabled clusters or less at its kickstart under 28HP.

    Who said anything about going large? I asked a question why keep the same strategy as with the HPC oriented P100 and not go for a higher transistor density with lower frequencies you're predicting. For P100 they increased compared to GM200 transistor density by 86% and invested the rest the process could offer for such a chip in frequency.

    And no I don't see why you couldn't get also full GP104 parts considering they won't arrive before June and wafer and binning yields are typically diametrically better for smaller chips. Most likely full parts at an obnoxiously high MSRP and salvage parts with a more reasonable MSRP, hopefully with full high speed memory this time *cough*
     
  2. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    821
    Likes Received:
    460
    GP100 disables 4 SMs GK110 disables 2 SMs. I dont see why % would matter. Having many smalller SMs (like the GP100 has) matters as you can catch the same, or even more defects, without disabling large parts of your GPU.

    I don't quite get you there. How can they go with an even higher transistor density. Obviously it will be 16nm, not 28nm.
     
  3. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,463
    Likes Received:
    187
    Location:
    Chania
    Of course does cluster size matter and of course do persentages also matter. On GK110 they had to initially disable a larger die area in analogy then they had to for P100. It might not tell us anything about wafer yields, but it's at least a good indication for possible healthier binning yields for P100.

    NV didn't take advantage of the full density increase 16FF+ allowed compared to 28nm. As I said it's an 86% increase mixed up with a >30% frequency increase compared to desktop GM200 (if I'd compare the initial P100 core frequency with the initial K40 core frequencies it'll get silly.....). Not to forget that the TDP is this time at 300W.

    What I was asking is why should they go for as high frequencies as you're suggesting and not go for more modest ones with a 100% density increase compared to their 28HP chips. It's not some sort of trick question I just don't see an HPC oriented chip with a 300W TDP being a good indicator for what they might have done with smaller chips. Should I also cut that TDP in half for 150 or even more Watt? If yes I'd guess that Polaris could have a joyride.
     
    #203 Ailuros, Apr 7, 2016
    Last edited: Apr 7, 2016
  4. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    errors in silicon tend to show up more as the chip size increases, its not a equal distribution of errors across a silicon piece.
     
  5. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    I think you're mixing up chip defects and errors being more likely at the edges of a wafer. This doesn't mean chips are more likely to have defects at their edges. It means that chips further out from center of the wafer are more likely to have (more) defects.
     
    Razor1 likes this.
  6. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
  7. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    A different transistor density for the same process would require a different fill factor of the standard cells per area (no benefit), a different standard cell library (unlikely), a different memory cell library (unlikely), or a different ratio between standard cells and memory area.

    But since compute oriented chips have traditionally used more memory than graphics chips (larger caches, larger register files), that would actually decrease transistor density for the graphics chips.

    So I don't expect any increase in density at all.
     
  8. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    429
    Likes Received:
    479
    got an interesting news today from an old friend. Google is going to be "by far" the biggest Pascal customer, followed by.... Baidu ! and GP100 allocation is full for the next 6 months, DGX1 is a big hit, they cannot build enough for the demand...
     
    Grall, dnavas, Malo and 1 other person like this.
  9. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Just to correct my previous post.
    Yeah still looks like Volta is on track for 2017 and the exascale projects with IBM.
    Just looked at very recent slide (IBM-NVIDIA collaboration Landscape) showing 2017 with Cuda9-OpenMP 4.x, Enhanched NVLink, and GV100 - which I assume is Volta.

    Cheers
     
  10. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,232
    Likes Received:
    2,837
    Location:
    Germany
    Would NVLink be comparable to PCIe or memory PHYs in scaling to new process nodes (i.e. not being able to achieve as high a density like logic etc.?
     
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    that actually makes sense that google is really interested in this.....
     
  12. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    Yes it's right up Google's street really, in more ways than one.
     
  13. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    It's probably primarily limited by PCB characteristics, not process speed.
     
    Razor1 likes this.
  14. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,232
    Likes Received:
    2,837
    Location:
    Germany
    Sorry, I did not mean speed, but rather area. Analogue circuitry - AFAIR - would not pack as densely and thus not scale well with smaller process geometry.

    IOW: Would the 4 NVLinks in GP100 take up a significant portion of the die.
     
  15. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    429
    Likes Received:
    479
    I have no idea of the die area but on P100 mezzanine connector, 4 Nvlink take 400 pins, out of 800 total (400 for PCIe and power). It's huge...
     
    Razor1 and CarstenS like this.
  16. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    The fact that GP100 has 4 SMs disabled is in no way an indication of process defect rate. If the new mezzanine form factor has a power and thermal limit of 300 Watts, then a fully enabled chip at full frequency would exceed those limits. Having 4 spare SMs gives NVidia risk management both for defects and for power/performance tuning. Some SMs might be fully functional but burn a little more wattage, and/or reliably clock higher than others. 56 SMs at 1328Mhz is likely better than 60 SMs at 1200Mhz, for example. You can then cherrypick the SMs and set frequencies that give you the best compute performance within your power/TDP range. Requiring the use of all 60 SMs loses that flexible tuning opportunity.

    That said, we don't know the new form factor's power or thermal limits. 300 Watts is the traditional max for dual-width 6+8 pin PCIe cards, but NVIdia could have chosen a higher bound for their custom form factor. But it's likely similar since a higher wattage would be fine for servers, but prevent the same chip from being used in PCIe card form factor designs.
     
    ImSpartacus, CarstenS and pharma like this.
  17. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    Excellent points, and I agree completely.

    Furthermore, 4 out of 60 is just a 6% reduction in compute units. Even if they don't do it to increase clock speeds or to select the optimal power configuration, this reduction doesn't change the competitive position of this product in any way: there is nothing to challenge it. Selling a version now with all units disabled wouldn't make any sense.


    To me, the fact that it's only 4 units is a very strong indication that 16nm is doing just fine.
     
    Razor1 likes this.
  18. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    If they have no competition then wouldn't they be better served by disabling even more units and having lower clocks at 250W? That would give them better yields now and would leave them with a bigger, more enticing upgrade later with 32GB, higher clocks and a full die. There must be some kind of competition or they wouldn't push it at all surely?
     
  19. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,175
    Location:
    La-la land
    You don't think that power (and ground) is supplied through the mounting screw points? Apple does that in the Mac Pro. Mezzanine connector doesn't seem heavy-duty enough to supply 300W, especially if less than half the pins are available for power delivery...

    You could simply drop clocks and volts a little, get a correspondingly larger drop in power than the additional draw from the extra enabled functional units (along the lines of the fury nano.)
     
  20. Bob

    Bob
    Regular

    Joined:
    Apr 22, 2004
    Messages:
    424
    Likes Received:
    47
    I agree; that would not be a very competitive product.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...