Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. Doesn't look like a P100, since 24GB GDDR5X possibly points to a 384bit bus, like the one on P102. I haven't heard of GDDR5X capabilities in P100, even more since the chip seems to have been finalized before GDDR5X as a standard.

    What this may mean is that the GP102 actually has 3840 sp and the new Titan is a cut-down chip.

    I don't think any of the Polaris chips have manufacturing problems because they've been shown in working condition since January.
    Just my opinion and I could be wrong, though.


    The money-making argument makes sense because apparently AMD didn't even bother to make 4GB cards. They simply flashed 8GB cards with a bios that won't allocate half the memory and lower its clocks to 7Gbps, put a couple hundred of them in the market just as a placeholder and called it a day.
    It's possible the amount of "4GB" cards is so tiny that they saved money by not even creating more than one product line. No quality-control, no separate order of 7Gbps VRAM chips, no separate assembly line created for a different product (other than the flashing bios part). The couple thousand dollars they lose by selling 8GB cards for cheap(er) is an investment to get the $199 placeholder faux-release.
    Again: not a very consumer-friendly or honest decision.
     
    #1821 Deleted member 13524, Jul 25, 2016
    Last edited by a moderator: Jul 25, 2016
  2. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Agreed,
    which is part of my 1st sentence when I mention density and die size :)
    It is still an issue with the cooling solution if it is not designed to cope with this (of course IHVs will rightly mention manual OC is beyond guaranteed spec) and more of a challenge if going with air cooling rather than water.
    Unfortunately vapor chamber is also only able to do so much as we see even with Nvidia cards.
    If truly OC either card beyond their performance-scaling ceilings I showed chart earlier from Tom's, both of this generation will need to be on water cooling.
    Context is about not needing to run the fans at rpm that is excessive and noisy/intrusive, and even then there are scaling limits that are lower than previous gen.
    Cheers
     
    #1822 CSI PC, Jul 25, 2016
    Last edited: Jul 25, 2016
  3. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
    Nvidia Potential Roadmap Update for 2017: Volta Architecture Could Be Landing As Early As 2H 2017
    http://wccftech.com/nvidia-roadmap-2017-volta-gpu/
     
  4. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I would agree if it was about FP32-only ALUs.
     
  5. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    Volta is going into big HPC installations that require FP64.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  6. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I suspect, they will go full Multi-precision ALUs by then. Hopefully.
     
  7. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    I don't know how useful those comparisons/predictions are anyway. Kepler->Maxwell->Pascal all form part of a very similar SM structure but afaik Volta could be very different and could make a jump in ALU count similar to Fermi->Kepler/Maxwell.
     
  8. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    I've heard that Volta has very different architecture than Maxwell/Pascal and with much higher efficiency.
    Personally I would like to see Volta with a very flexible ALU that can do FP64 / 2xFP32 / 4xFP16 but its just a wish, I have no idea how it really be...
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  9. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
  10. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    Okay, GP102/TitanX updates: slow FP16/FP64 confirmed. 471mm2 die size. Other than INT8, this really is a bigger GP104
     
    Alexko, Newguy, Grall and 6 others like this.
  11. Really sounds like 1.5x GP104, which would make the whole chip's SM count at 30. With the news of Quadro P6000 coming with 30 SM and 384bit GDDR5X bus, all this is pointing for the new Titan being a cut-down GP102.
     
  12. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Thanks for the update.
    Well Nvidia managed to create a confusing mess now with their narrative on scientific Deep learning and research operations.
    You need a P100 in general for optimal performance of FP32/FP16, but a GP102 if doing Int8/dp4a; basically they have screwed any shared workload possibilities and added further cost-complexity considerations, which matters in the business-scientific world.
    Made worst by how Nvidia seem recently to be pushing Int8 scientifically.

    1st time in a long while I think they have made a bad business strategy here.
    I think they will have no choice but to release an updated P100 with Int8/dp4a and share this through Tesla and Quadro now, possibly with certain number of FP64 Cuda cores disabled and at a reduced priced.
    Makes one wonder what Pascal is going to fit in to replace the more inefficient K80, or at least be more competitively priced and efficient (obviously less overall performance than the P100).
    Cheers
     
    #1832 CSI PC, Jul 25, 2016
    Last edited: Jul 25, 2016
    Grall likes this.
  13. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    Ryan, Is GP102's vaguely described "INT8" feature different than GP104's DP4A instruction? I suspect they're the same and you're mistaken about INT8 being a difference between GP104 and GP102.
     
  14. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    I'm not so sure I can agree with the idea that it's bad business strategy for Nvidia to push certain customers towards higher-priced SKUs... Perhaps if they had promised this functionality then reneged at the last minute (cough>2SLIcough) but I don't think that has happened here.
     
  15. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    There's a credible second-hand report that DP4A's design just didn't make it in time for P100.

    I'm not a machine learning expert, but there is a big numerical difference between training a neural net and evaluating one. Training requires backpropogation and gradient information, which needs more precision because it will be used for division. FP16 is evidently enough. Evaluation of an existing net (designed for 8 bit) just needs to evaluate a lot of weighted sums of 8 bit input node values. That's a summed dot product A0*B0+A1*B1+A2*B2+A3*B3. And that weighted sum is exactly what integer DP4A does, where each element is a byte in a word. DP2A also exists and it's two 16 bit integers instead.
    So the deep learning targeting may be "Buy a P100 to train your nets, and any other Pascal GPU to evaluate them afterwards."
     
    nnunn and pharma like this.
  16. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I think they are looking to push the GP102 for inferencing.
    But that is asking for research labs/teams to add additional separate hardware where for most they would possibly looking to share workload more.
    Agree ideally you would use different dedicated HW, but that adds cost and complexity, which will not suite everyone.

    Yes I agre about the P100, which is why IMO there needs to be an update or new lower product that is a true mixed-precision (with Int8/dp4a) accelerator.
    It would not had been much of a headache apart from that it seems Nvidia is just starting their new push and narrative of using Int8 in research, which I doubt will be limited to inferencing.
    This has impacts on the optimisation of the various research apps they work with, in terms of adding complexity and now dedicated GPUs.
    IMO just a level of complexity they did not need to create for scientific research/supercomputer implementations/smaller teams/etc, and also creates certain gaps as I briefly suggested in earlier posts.
    Anyway I doubt everyone is happy in the research world about the idea of buying a GP102 to do one specific task with regards to DL, or have to use it possibly for other optimal Int8 operations.
    As Ryan and ToTTenTranz says this is more GP104 version1.5 rather than a real Titan.
    Cheers
     
    #1836 CSI PC, Jul 25, 2016
    Last edited: Jul 25, 2016
  17. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    I nominate this post for the B3D Hall Of Fame!
     
    DuckThor Evil likes this.
  18. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    It seems that way. But I need further clarification. It may just be that there's a software throttle somewhere...
     
  19. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    Update re: GP102 precision capabilities via Anandtech:
     
  20. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    I think I have posted this at least 10 times before on this forum. But once more.

    GP100 has 2x rate FP16. It has no Int8.
    GP10x (102, 104, 106, etc.) have essentially no FP16 (1/64 rate), but have 4x rate Int8 with 32 bit accumulate.

    The idea that GP10x has artificially throttled FP16 is wrong, the chips just don't have FP16 besides a token amount for software compatibility.

    I don't know whether today's GP104 products have artificially throttled Int8, I haven't seen evidence either way. But GP104 and 102 are the same architecture, just with different numbers of units. GP100, on the other hand, is unique.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...