Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    Yes, you are. You don't want to hit that maximum. Ever. If you do, you loose the ability to utilize all cores, due to lack of concurrent warps.
    And you can't just scale the register file size up (without other drawbacks), so you just don't. Instead you scale the SM cluster in total down, and add more of them. Increasing latency and decreasing throughput of each individual warp, but improving overall throughput, especially in worst case.
     
    Razor1 likes this.
  2. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Kind of interesting that P100 has been targeted for deep learning when its real killer feature is a staggering 5TFLOPS of FP64 operations.
     
    #62 silent_guy, Apr 5, 2016
    Last edited: Apr 5, 2016
    Razor1 likes this.
  3. Dr Evil

    Dr Evil Anas platyrhynchos
    Legend Veteran

    Joined:
    Jul 9, 2004
    Messages:
    5,767
    Likes Received:
    775
    Location:
    Finland
    I'm guessing the Pascal Titan will be the GP102. I think it'll take a looong time for GP100 to appear as Geforce, if ever... GP104 coming first with X80 and X70 models and GP102 later as Titan and X80 Ti or something like that.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  4. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    GK210, used for the Tesla K80, is used only for HPC Tesla. Its notable difference is double register size over GK110.
     
    nnunn likes this.
  5. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Any other sources for this than the rather illustratively purposed block diagram? :) Contrary to a 1:3 ratio, 1:2:4 ratio lends itself quite naturally to multi-purposed units, IMHO.
     
    Ext3h likes this.
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The preliminary numbers for GDDR5X make it better in terms of power per unit of bandwidth relative to GDDR5, but still inferior to HBM and by extension even more so than HBM2.
    Nvidia makes a point of citing HBM2's native ECC, and the standard also has room for a few other items like a thermal failsafe and row hammer mitigation, which access-happy HPC might appreciate.

    Then there's the footprint of GDDR5X PHY which is unclear, while GP100 also needs to devote perimeter to 64 20 Gbs lanes for NVlink (I recall something like that footprint for AMD's rumored HPC APU and/or the GMI links).
     
    Razor1 likes this.
  7. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Sorry if someone else has picked up on this but only got time now to post:
    I was trying to find correlation figures pertaining to Alexnet/imagenet peformance to put this chart into a better context.
    The closest I could find relates to the Titan X that managed 450 images/sec (from official NVIDIA slides), it could be deemed the Titan X was better specc'd in this context than the Tesla K40 so some allowances would need to be made for the figures to be a bit lower.

    So is expanding that would suggest Maxwell M40 manages around 2,700 images/sec, and the new Pascal-cuDNNV5 replacement manages around 4,500 images/sec.
    Give or take due to the Titan X/K40 differences, and very rough ballpark figures.
    Cheers
     
  8. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Agreed, it's interesting. I think they might be more worried about competition with Intel in the HPC market than maximising efficiency for deep learning, where they already have a huge competitive advantage (especially in software, but also increasing on the hardware side with FP16).

    For GeForce, I hope we'll see a fully enabled 60SM GP100 Titan this year (SHUT UP AND TAKE MY MONEY!!!) but it sounds possible we might see GP102 and/or GP104 before that. It's intriguing that NVIDIA has a chip with a '2' as the last digit, They haven't had one since G92 (where G90 was missing). This could mean a smaller difference in performance than usual (e.g. 3072 SMs without FP64 or NVLINK?) where GP100 is *only* released as a Titan outside of the professional market or not at all.
     
    Grall likes this.
  9. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    I wonder how those 5 TFLOPS of DP will turn out in practice. After all, compared to GM200, P100 has eight times as many ALUs to feed from it's same-sized register files. Compared to GK110 it's only half though.
     
  10. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    Not going to bet on it, but my guess is going to be a Titan P with full 64 SMs, and a GTX 1080 Ti with 56.

    Also, sometimes later next year, a 32GB variant.
     
  11. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    582
    Likes Received:
    285
    Will be finally able to have mixed resources types in the same heap?
     
    sebbbi likes this.
  12. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I frankly don't see them releasing a full SM version any time soon, if ever.

    The beauty of having tons of identical cores is that disabling a few is almost invisible. See GTX 980 Ti vs Titan X.

    The cool thing about redundancy and random faults is that the benefits of redundancy are largely uncorrelated with the size of the redundant block: if you have 30 blocks and you disable 1 of them, your benefit won't be much better than having 60 and disabling 1.

    However, your benefits go up significantly with the number of redundant blocks. 1 redundant block out of 30 is much less effective than 2 out of 60, even though it makes no difference in terms of redundant area. My theory is that Nvidia split the SMs in half for this reason. With a smaller granularity, they can exploit this benefit.
     
  13. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,486
    Likes Received:
    397
    Location:
    Varna, Bulgaria
    GP100 is tied to HBM2, so the memory yields will be the primary factor for the eventual consumer availability. And Quadro is also on the line... priorities, priorities.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  14. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    788
    Likes Received:
    74
    Location:
    'Zona
    GK210 would like a word with you...
     
    Razor1 and fellix like this.
  15. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,486
    Likes Received:
    397
    Location:
    Varna, Bulgaria
    Yea, it's kind of a trend since Kepler -- each generation comes out with more compact multiprocessor design. By the time for Volta Nvidia will bring it full circle back to Fermi. :p
     
  16. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,525
    Likes Received:
    686
    I don't know why, but my spider sense tells me there is something quite not right with nVIDIA presentation today. Deja vu from Fermi times?
    - No actual hardware presented,
    - Jen did not seem his usual very enthusiastic self, the hesitation about arriving time, concluding with "soon",
    - 300W GPU on a new fabrication process right off the bat (when HBM2 is supposed to be more energy efficient than GDDR5)
    - No GPU roadmap (possibly the first time one is not shown at a GTC?).
    - Since nvLink was developed in cooperation with IBM, isn't it weird that IBM was not there at all?
    - Yes, there were lots of references of companies going to, or intending to use Pascal, but we've seen how design wins worked for Tegra in the past.
    - Talking about Tegra, where was it? Shoehorned into Drive PX2 with barely an honor mention?
    - I did not get what was the point of spending so much time presenting VR demos on stage that couldn't obviously be experienced by the audience there, plus a cringe worthy cameo by Steve Wozniak.. Yey, we spend infinite amounts of time recreating Mars... because we can.
    By the end of the presentation I was freaking bored, which is not usual for nVIDIA's.

    EDIT - Plus one disturbing fact: there was not a SINGLE demo where they said it was running on Pascal! Even on Fermi presentation, with all the clusterfuck going on behind closed doors, Jen demoed things supposedly running on Fermi. Now they say Pascal is going to volume production and do not care to even show a single demo, but just pretty charts?
     
    elect likes this.
  17. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    322
    Likes Received:
    82
    While it's interesting to see any number of things confirmed, the most surprising is the just apparent scaling of the whole thing. 600mm already (cough, no release date and the only mention is Tesla aka super high end and very high price). But... that's all they got out of 600mm? They can't scale more on this process, at all. And just a 66% jump on the highest end?

    Is HBM2 really that big? I mean, geeze. It's not like the transistors to performance even scaled linearly, hell it dropped. We have a 66% (approximate) performance improvement from an 87.5% increase in transistors. With HBM 2 and finfet Pascal manages to be worse, transistor per transistor, than Maxwell. A 12% drop in efficiency is not what you want out of a node jump and "architecture jump" at all.

    In its way it's similarly impressive in the same way Nvidia has done in the recent past, aka make a huge chip. But the limited ram (16gb) for an exclusively high end card (AMD puts out 32gb cards already for their highest end) and the rather disappointing performance for a new node already being maxed out for size is... well hopefully Volta works out well for Nvidia next year. Which is not to say 66% wouldn't be impressive in its own right, but the Tesla line costs a hell of a lot so it doesn't seem like, price for performance wise, this is going to be all that attractive for a while.
     
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    You're talking about spec sheet performance. Let's see how it turns out in the real world.
     
  19. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Mr. Pony, when you forget the double sized register files, and 1:2 FP64 ratio, you shouldn't be surprised that your efficiency numbers are a little bit off compared to a chip that doesn't have it.

    As for comparisons with AMD: they don't play in the market, so it doesn't matter.
     
    pharma, Razor1 and nnunn like this.
  20. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Well for now you would need to compare the Tesla to the Fiji Pro model, and its HBM is limited to 1 or 2x4GB, on the plus side for AMD is at least you can buy it soon.
    I must admit I am surprised any 600mm with high clocks is already achievable even if yields are appalling, the rumours made it sound like we would not see anything like this soon from either TSMC or Samsung-GF and were reinforced that only low power/small GPU models have been seen to date or talked about.
    I wonder how close AMD with GF are themselves to something comparable; meaning can achieve benchmarks (the big Pascal looking at the presentation did seem to be used as they mention data based on 20 iterations for the Alexnet) and not necessarily actually to be released soon.

    I guess NVIDIA will be cagey about release schedule and especially for consumer models as they do not want to tank current sales.
    Cheers
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...