Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,807
    Likes Received:
    2,073
    Location:
    Germany
    Charlie at it again, seemingly dissecting why Nvidia would not have working Pascal silicon in-house, based off of a forum post at our very own Beyond3D.

    http://semiaccurate.com/2016/02/01/news-of-nvidias-pascal-tapeout-and-silicon-is-important/

    Seems pretty legit, except one thing: He bases his assumptions of timing etc. on the linked forum post here and explains why Nvidia would not have Pascal silicon in-house because they moved around bring-up tools only end of december. Flaw in his argument: The post explicitly says "big Pascal" while the tool movements were for a 37.5×37.5 part,

    Possibly, big Pascal could have taped out and brought up first, and now Nvidia is working on bring up for a smaller chip.

    From what I could quickly gather off of some photos, GM200's package was roughly 45-46 mm wide, GM204's 40-41 and GM206's 37-38 mm - and Charlie's article focuses on a 37.5×37.5 BGA size. Go figure.
     
    pharma, I.S.T., Razor1 and 1 other person like this.
  2. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    plus there are tools for water cooling parts, the only water cooling parts that we know of right now being made in house is for PLX2, not GP100.
     
  3. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,807
    Likes Received:
    2,073
    Location:
    Germany
    Are those really tools for WC or just components of an easily detachable Watercooler for the bring-up kit? I don't know.
     
  4. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    715
    Likes Received:
    220
    Location:
    india
    Somebody posted zauba shipping manifest in SA forums, which seem to be the best leaks currently, for an unnamed nvidia chip having price like 2.5x of Fiji samples. Way back in August-September so probably related to a june tapeout?

    More interesting for me was that Polaris got demoed to press at Sonoma in early december in similar fashion as at CES,

    Would explain why Koduri is so confident of distancing themselves from nvidia's new chip release.
     
  5. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    AMD is the one that has to prove to OEM's and system builders they are ready with their products, nV doesn't need to prove themselves as they already have cornered that market.

    As nV has done so many times as did AMD/ATi when they have well selling products in the OEM markets, then tend to slowly introduce new top ends to OEM's and system builders (this is because OEM's have already a certain amount inventory obligations based on contracts), where the focus on first time adopters through general consumers first. This also helps when or if initial yields are low.....
     
  6. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,807
    Likes Received:
    2,073
    Location:
    Germany
    At least AMD has shown only a to-be mobile part, thus not hurting their partners' channel inventory. Nvidia OTOH would probably hurt current sales much more if they already showed working silicon.
     
    Razor1 likes this.
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,807
    Likes Received:
    2,073
    Location:
    Germany
    Could you please fight your petty wars outside of the public portion of this forum? Greatly appreciated, thanks!
     
    pharma and Razor1 like this.
  8. revan

    Newcomer

    Joined:
    Nov 9, 2007
    Messages:
    55
    Likes Received:
    18
    Location:
    look in the sunrise ..will find me
    Apologized for that and thread cleaned. And I didn't tried to start a war, even a petty one, but to provoke (by being... provocative) a response from a specific forum member (still hoping for that)... Instead of that I saw other people, people that I respect greatly, offseted by my comment and I realized something went wrong...
     
  9. TARTOFCP

    Newcomer

    Joined:
    Aug 26, 2010
    Messages:
    8
    Likes Received:
    1
    Hello everyone.
    First, I do not speak English well. I hope you understand.

    This is what I found.
    2016-02-09 17;40;05.jpg
    699-2H403 - ?
    699-12914 -
    699-1G411 - GM204 variant ?
    699-1H400 - GP104 ?

    Background Information
    PG401 - GM204 Reference Card (GTX980), 699-1G401
    PG600 - GM200 Reference Card, 699-1G600
    PG301 - GM206 Reference Card (Maybe), 699-1G301

    I think everyone knows this.
    2016-02-09 17;57;47.jpg
    26-Oct-2015 85423100 GRAPHICS PROCESSOR INTEGRATED CIRCUITS 3R08A South Korea Banglore Air Cargo NOS 10 89,950 8,995 - This is Unique

    Please delete this article if this is the problem.
     
  10. McHuj

    Veteran Regular Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,466
    Likes Received:
    586
    Location:
    Texas
    Via Anandtech, GDDR5x update from Micron.

    http://www.anandtech.com/show/10017/micron-reports-on-gddr5x-progress

    So this likely means, if Pascal GP104 is using GDDR5X, we might not see it until the fall of this year. Which is a big bummer.
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,261
    Likes Received:
    1,950
    Location:
    Finland
    Razor1 and CarstenS like this.
  12. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    774
    Likes Received:
    202
    Earlier this week I found some NVIDIA presentations that contained details that I have not seen before (or I don't remember).

    Slide 6 of "The Future of HPC and The Path to Exascale" (from 3 March 2014) gives a roadmap with a DP GFLOPS/W value for Pascal. The presentation's date is between the GTC 2013 roadmap which does not contain Pascal and the GTC 2014 roadmap which does contain Pascal.

    [​IMG]

    Below are the approximate DP GFLOPS/W values for the various architectures:
    • Tesla: 0.5
    • Fermi: 2
    • Kepler: 5.5
    • Maxwell: 10.5
    • Pascal: 14
    • Volta: 22
    Slide 43 of "Accelerators : The Changing Landscape" (from around May 2015) shows that a Pascal GPU has a peak of over 3 TFLOPS (presumably DP). I'm not sure if it includes boost, it very well could because the GFLOPS values of existing GPU parts on page 14 are from the maximum boost clocks.

    [​IMG]

    Now, I had written up a long post analyzing these two pieces of information, but I subsequently found some more presentations and I had to change most of it.

    Slides 12, 14, and 17 of "GPU Accelerated Computing" (from 10 November 2014) contain a roadmap of specific Tesla parts, not just architectures. This roadmap contains a single-GPU part called "Pascal-Solo" with a 235 W TDP. [Also, you may notice something missing in slide 11, which makes sense given the date.]

    [​IMG]

    The presentation doesn't specifically state what chip the "Pascal-Solo" uses, but I think the slides following the roadmap may point to the GP100 chip with HBM2. The roadmap contains all the Kepler Teslas that I previously knew of but it does not have any Maxwell Teslas (Maxwell isn't mentioned in this presentation at all), so I think it's possible that the "Pascal-Solo" isn't the only Tesla planned for 2016. But I haven't found any direct evidence of any other Pascal Teslas in 2016, or any Pascal releases before the later part of this year for that matter. EDIT: That being said, I also haven’t found any hard evidence that says there will be no Pascal parts before late 2016. There may be little reason for presentations to specifically mention a future Pascal chip, even in a Tesla part, that does not have HBM2 or NVLink. I’m still hoping for a GP102 or GP104 release in March or April.

    The last piece of information I found may explain what I previously thought might be a a discrepancy between the "Future of HPC" roadmap and the GTC 2015 roadmap. The GTC 2015 roadmap shows ~42 SGEMM/W for Pascal. Given how close the SGEMM/W and theoretical SP GFLOPS/W numbers are for Maxwell, and that Maxwell and Pascal seem to be architecturally similar, I guessed that Pascal has a theoretical ~43 SP GFLOPS/W. I had also assumed that fast DP Maxwell has a 1:2 DP rate, but 43 is much higher than two times 14.

    Slide 75 of "New hardware features in Kepler, SMX and Tesla K40" (from April 2014) mentions that a Pascal with stacked memory has 4 DP TFLOPS, 12 SP TFLOPS, and 1024 GB/s. It's worth noting that the DP value matches the value from a presentation linked earlier in this thread.

    [​IMG]

    I don't think these FLOPS numbers automatically imply a 1:3 DP rate—the number of significant figures are few enough to mask small differences from 1:3.

    Question 1: Is it possible for a Pascal chip to consist of some SMs with a 1:2 DP rate and other SMs with no DP or a 1:32 DP rate?

    Taking into account the above information, the 12 SP and 4 DP TFLOPS values more closely align with a ~280 W TDP than a 235 W TDP. So I'm thinking that either some roadmap information is outdated or there is some hidden > 235 W Tesla part that we don't know about. After all, the K20X wasn't unveiled at the same time as the K20, even though both parts launched at about the same time. So my current guess for the 2016 Tesla lineup is as follows:
    • Tesla P##: 1x GP100, ~14 DP GFLOPS/W, 235 W, ~3.3 DP TFLOPS, ~9.9 SP TFLOPS, 1 TB/s
    • Tesla P##X: 1x GP100, ~14 DP GFLOPS/W, 275-300 W, 4 DP TFLOPS, 12 SP TFLOPS, 1 TB/s [less likely?]
      • Question 2: Is it possible to have 2x GP100 on the same interposer? (Or even 2x GP102 if that chip uses HBM2.)
    By the way, I have collected a large number of NVIDIA roadmaps in this presentation file, including all four in this post.
    https://www.icloud.com/keynote/000-oJJ9_Z8mkHNjW-08KaA3Q#NVIDIA_GPU_roadmaps
     
    #752 iMacmatician, Feb 11, 2016
    Last edited: Feb 13, 2016
  13. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    81
    Likes Received:
    47
    4Tflops is more like a design target for HPC products.

    Traditional HPC cards are either passively or actively air-cooled, and also considering the power limit (235W), I would expect the Pascal HPC cards are significantly downclocked comparing to their gaming (water-cooled) counterpart, more so than these in Kepler generations.

    The main reason is HBM with interposer, interposer is made of copper, and it will pose significant cooling challenge, sorry for my limited english knowledge, however due to different heat-expension coefficients of Cu and Si, the chips will experenice significant stress/sheering force between loading and unloading cycles, and will significantly shortening the life cycles of the chip comparing to traditional chips if the temp is high, for an acceptable life, such HBM2 package has to be working in a much lower temp, we are not talking about 80degreeC, more like 50 degreeC or below, so if it is air-cooled instead of water-cooled, and if its load is very stressful, Nvidia has no choice but significantly downclock their new HPC chips or try to persuade all the big players in HPC market to design new water-cooled solutions (not very likely), so the freqency thus the performance gap between air-cooled HPC chips and water-cooled gaming chips can be huge.

    So a 4Tflops HPC performance is not a very good indication for the peak performance of Pascal.
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    Why would there be an interposer made of copper? That has not been the case for AMD or others. There are materials besides silicon, but I have not seen discussion on metal ones.
    With regards to thermal expansion, an interposer presents a less problematic surface for the silicon chips on it versus the organic package that a GPU like Fiji would otherwise attach to.
     
  15. Nakai

    Newcomer

    Joined:
    Nov 30, 2006
    Messages:
    46
    Likes Received:
    10
    Neural Networks are always deterministic. The problem is always the quality of the underlying training algorithm and methods. It is necessary to make sure that your training algorithm and training examples covers unlikely inputs and outputs. The most commonly used training algorithm is the gradient-descent backpropagation algorithm (GD). Another algorithm would be the Particle Swam Optimization (PSO). The latter is better usable for training on GPU clusters. Both use labeled training examples, which are used to iteratively adjust the network parameters (biases and weights). This kind of training is also called super-vised learning. There exist many different neural network structures. For automated driving and the most image processing tasks, Convolutional Neural Networks (CNN) are used, which feature good parallel execution streams and less dependencies.

    I don't see online training methods for automated driving, especially not in the fields of image recognition and processing. The problem is just, that training costs much processing power, and it is not guarenteed that the training is successful. It could happen, that the network parameters get worse and not better.

    Of course huge processing power is necessary to execute an CNN. Normally you set up an CNN with an fixed input region, which cannot be changed at all. You need to train your CNN with corresponding labeled training cases. For example, if the input region is 32x32 pixel (which is very common), you cannot feed the network with smaller or bigger input. It might be possible to scale the input data upwards and downwards, but there will be problems with the recognition rate then. For a 720p input stream you need to raster scan the CNN over the whole input, or try to use some pre processing to determine "where" are the regions of interest, which then can be feed into the CNN. Then of course you need additional preprocessing. For traffic signs you need to use RGB pixels, and not just pixel intensity.

    The whole process of automated driving is very crucial. You don't just use CNNs, but also many other algorithms in order to extract the necessary informations. All the information needs to be gathered, via a process called "sensor fusion". The whole process is very complex, and I don't have that much of an insight into every detail. CNNs or Artificial Neural Networks is just one part of the whole process and not an universal remedy.

    -----

    About the whole stuff about FP64 and FP32 execution rate of big Pascal. Maybe it will look like this:

    [​IMG]

    There are 4 execution blocks in a SM. Always two of the are sharing one FP64 SIMD, which would give a ration of 1:0.375, which is pretty close to 3:1. Maybe the boost clocks get lower for FP64 tasks and exeuction.
     
    pharma and CarstenS like this.
  16. Josephwang

    Joined:
    Feb 17, 2016
    Messages:
    1
    Likes Received:
    0
  17. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,494
    Likes Received:
    405
    Location:
    Varna, Bulgaria
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,261
    Likes Received:
    1,950
    Location:
    Finland
    I think there's couple other, mor interesting notes on that slide deck:
    - NVIDIA considered HMC first, not HBM
    - NVIDIA claims they have integrated the memory to be part of the actual GPU die

    I would just like to ask - what the f#¤k and how the f#%k this happened, TSMC isn't doing HMC nor HBM, and don't those actually require a different manufactuing process altogether?
     
  19. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,807
    Likes Received:
    2,073
    Location:
    Germany
    This is from a Cuda Fellow, i.e. an independent researcher outside of Nvidia. I won't bet any vital parts on it, that everything in there is officially sanctioned Nvidia material.
     
    Razor1 likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...