Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. huebie

    Newcomer

    Joined:
    Apr 10, 2012
    Messages:
    29
    Likes Received:
    5
    Okay, i haven't expressed myself exactly: What about the functions in the SFU? Are they still the same and can they run Ops withoput "closing" the affected pipe for the duration?

    As said above: Didn't meant the numbers, but thx. :)
     
  2. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,930
    Likes Received:
    1,626
    It's a single GPU solution ....
     
    nnunn likes this.
  3. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    542
    Likes Received:
    171
    This slide:
    [​IMG]
    seems to imply 4:2:1 ratio for HP : SP : DP, optimized so that all levels take maximum use of on-chip throughput.
     
    pharma, Razor1 and Grall like this.
  4. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Well, one could theoretically interpret that slide as half-precision having 4x throughput of single-precision (with double-precision throughput undefined), but that would be really really odd...! :p

    I'm sure DP:SP:HP won't REALLY be 1:2:4 in consumer boards though, because this is evil fucking nvidia we're talking about here. They'll find a way to knock it down to like, 1 op per 32 cycles or some shit like that in the Geforce line.
     
  5. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    332
    Likes Received:
    87
    Probably, entirely within bounds of reason if it's 1:2 as well. Theoretically a Fury X would already get 4 teraflops DP if it had the hardware to go 1:2. Trouble is, an increase to 8 teraflops SP is, theoretically, only a 30% increase from a Titan X. That would normally be plenty big, but not for a jump as big as is expected from 28nm to 16nm. I'd have expected a jump of 50% or more... maybe their 1:2 ratio really does take a lot of silicon though.
     
  6. superjoeyprof

    Joined:
    Mar 11, 2015
    Messages:
    8
    Likes Received:
    2
    I didn't know this pdf. I just ask SMCI at SC15, and the guy at their booth said they will build some servers for the next dual GPU Tesla, and it will provide 4 TFLOPS DP.
     
  7. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    GM200 is a 600 mm2 chip on a very mature and relatively inexpensive process. I wonder if Nvidia is being more conservative with GP100, since it is their first 16nm product.
     
    pharma and iMacmatician like this.
  8. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    773
    Likes Received:
    200
    The GK104 SMX is 16 mm^2 and the GK110 SMX is 22-23 mm^2 (my estimates), which is a 40% increase in SMX area. To my understanding there are other changes between the two chips besides DP rate, but I still expect a decent increase in area from a GM20x SMM to a Pascal 1:2 DP SM, ignoring the process change.
     
  9. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    332
    Likes Received:
    87
    Yet, if we just go by DP and SP rate, a Fury X replacement could increase by just 25% to say, 10 teraflops SP, have the same amount of room to increase DP to 1:2 on their high end compute focused cards as Nvidia does (as both Fury X and Titan X are around the same PCB size with the same gaming first/DP almost not at all concentration) and still come out ahead of Nvidia by 25% the DP rate.

    Nvidia, with it's CUDA environment and 12gb of ram, did well in compute with the Titan X, and thanks to some large bottleneck in design with the Fury X did better in gaming as well. But assuming any bottleneck there is removed in a new high AMD GPU design, they'd have the clear perf advantage for gaming, and since both vendors are using HBM2 with assumedly the same yields, they'd end up with the same RAM size for both chips. So a 4 teraflop performance for Nvidia's highest end compute seems, at least in speculation, a comparatively low target. Especially if AMD manages to hit their "2x perf per watt over previous generation" that they've stated. Which would, at a 250 watt tdp, put their top end GPU at 67% higher perf than a Fury X. I'm discounting a huge share of this to assume they mean gaming performance only, and a large part of that coming from having no design bottleneck choking performance. But even assuming that it would seem AMD might have a very large advantage.
     
  10. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    NVidia's first 40nm chip was the tiny 57mm 16-core GT218 in November 2009. The 529mm^2 GF100 didn't ship until March 2010, 6 months later.
    NVidia's first 28nm chip was GK104 in March 2012. GK100 didn't come until November 2012 (in the form of Tesla K20), 7 months later.

    So it would be reasonable to see NVidia launching 14nm with GP104, and GP100 following 6 months later.
    The introduction of HBM2 memory could disrupt this, though. We don't know if GP104 will have HBM2 or not, and if it doesn't, perhaps NVidia would launch the flagship first to boost the Pascal performance marketing halo.
     
  11. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    It'll be really disappointing if it is only 8TF. With 1TB of memory bandwidth (3x Titan-X) 8TF (Fury-X is already has that beat) sounds pretty pathetic.
     
  12. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,490
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    More like 12% difference. ;)
     
  13. A1xLLcqAgt0qc2RyMz0y

    Regular

    Joined:
    Feb 6, 2010
    Messages:
    988
    Likes Received:
    280
    No delay this time. The GP100 will come nearly at the same time as the GP104 because Nvidia needs it for HPC and also to compete against Intel.
     
    nnunn likes this.
  14. dbz

    dbz
    Newcomer

    Joined:
    Mar 21, 2012
    Messages:
    98
    Likes Received:
    41
    1. Aren't the figures based upon Tesla parts generally attributed to base clocks? K80 for example ships with a base clock of a meager 560MHz for an FP64 theoretical throughput of 1.87 TF (close to the graph/slide shown earlier), yet the K80 boosts to 875MHz (theoretical max of 2.91 TF) depending upon thermal/power limits (spec PDF). The K40 plot point in the graph/slide/pdf posted earlier also pertains to base clock (745MHz) and not the 810/875MHz boost.
    [​IMG]
    2. I'd be wary of translating Tesla number to a GeForce product line. GK110B in Tesla has pretty conservative clocks (K40 has a 745MHz base, 810 or 875MHz boost as I just noted). GeForce GK110B's, somewhat higher at 889MHz base, 980MHz nominal boost for the Titan Black. Incidentally, even though boost is disabled for the Titan Black when executing FP64 at its native 1:3 ratio, the Titan Black's theoretical throughput is higher (1.7TF) than the K40 (1.43TF). The FP32 figures of course are vastly different. K40's specification is 4.29TF, while Titan Black's nominal spec is 5.1TF which would rise to 5.6 - 5.9TF with actual observed boost applied.
     
    Razor1, iMacmatician and pjbliverpool like this.
  15. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    reading this make me think, that the GP100Gaming based gpu will still come 6-8 months after the GP104 based (like GK110 at this time ) ... But itwill be available for HPC, workstation market before ..

    or do you think, they will release the GP100 gaming gpu at the same time of the GP104 one ?
     
  16. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    332
    Likes Received:
    87
    Huh, different reporting methods is entirely possible. I would have assumed Nvidia would put its absolute best foot forward for PR purposes, but maybe they only consider base clock for compute centric cards out of concern for perf/watt and etc.

    Regardless, GP100 will almost certainly come out as a compute/professional card first, with a slightly binned consumer version coming later. That's been the case for both the Titan and Titan X, so I don't see a reason it wouldn't continue. WHEN it will come out will be dictated far more by yields numbers than by competition concerns. As the biggest card it'll need the most reliable yield unless Nvidia wants to throw a huge amount of defective cards just for the sake of market share. I'm sure, with their profits this year, they'll feel confident in waiting a few months for yields to improve if that's what it takes. They've already pulled back their promises for RAM over yield concerns with HBM2 after all.
     
    Razor1 likes this.
  17. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    715
    Likes Received:
    220
    Location:
    india
    I remember the 2.91TFLOPs figure for the K80 being thrown around in the media which was decidedly the boost clock.

    http://images.anandtech.com/doci/8729/TK80.jpg

    Also see the table here,

    http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu

    :lol:

    But with those graphs I haven't seen such discrepancy as above. Here's one with both K80 and K40 with boosts,

    http://cdn.wccftech.com/wp-content/...d-computing-path-forward-page-004-635x357.jpg

    Maxwell based Tesla cards don't seem to be that handicapped relative to their geforce versions, in fact the Titan X based Tesla hits even higher numbers, 7TF vs. 6.7 from TR's review. Some of that might be due to having same amount of vram(besides having DP at the same rate), but then HBM would allow better clocks as well. The base clock for M40 is 948 according to nvidia's site(though the boost is lower than AT's reported, but still higher than TX's),

    http://images.nvidia.com/content/tesla/pdf/tesla-m40-product-brief.pdf

    A 1Ghz base clock of 4096 shaders part with 8TF SP and 4TF DP seems a good bet. It'd also look way better marketing to compare the base clock now in those graphs vs. K80.
     
  18. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    HBM doesn't automatically allow higher clocks, higher clocks are through design and nodes, not by choice of vram. It might allow for more silicon to be used for the shader array though.
     
  19. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    Well the other side of bringing out the professional series first is they do have competition with Knights Landing, so I think that is correct we will see them first before the consumer cards.
     
  20. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Maybe it allows for lower clocks, in a way, since now the RAM sits right under the heatspreader with the CPU core, meaning it gets grilled to whatever temp the core is running at... I'm no expert about these things, but hasn't it said that DRAM performance characteristics is affected by its running temperature?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...