Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    For the purposes of manufacturing the GPUs, the limit is an optical one related to the equipment. It's not going to change much.
    The interposer for Fiji did a few things to allow the whole assembly to exceed the reticle limit for the interposer. The interposer itself is larger, but the patterned area subject to optical limits does not cover the whole interposer. The GPU runs right up to the limit on one dimension, and the HBM stacks partially extend past the edge of the patterned region, taking some spacing pressure off of the GPU.

    There are ways to expand the area of the interposer, with varying degrees of complication and risk.
    Some of TSMC's planned products for 2.5D integration might be expanding the limit, and others might be doing so as well as they get past the early implementations of the concept.

    What was called HBM2 is a separate revision of the spec, but it the package is defined to be larger than the early version of the memory used by Fiji.
    http://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification

     
    Razor1 and silent_guy like this.
  2. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Yes i remember have read something on the interposer size, that it could be larger than the optical limit. It was mostly constrained by the fab used ( remember that they was reuse old fab tools for reduct cost )
     
  3. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    324
    Likes Received:
    84
    The "leak" is hilariously wrong, just someone on the internet taking the old 28nm designs and doubling them because they heard that's what Pascal would be. Which would be a fantastically huge wast of Finfet designs when you can trade off a bit of die density for a huge boost in clockspeed.

    I mostly wonder how much space is going to be taken up by the ideal 4 stacks of HBM for high end compute GPUs. The Fury X already had pretty big HBM stacks, and that was for 1. Ohwell, memory has to go somewhere, still I wonder if this size is part of the reason AMD has "next gen memory" listed for their next real architecture jump two years from now.
     
  4. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    "Next Gen" memory could mean anything.. Including HBM evolutions ( lets call it HBM3 ). AMD is doing a lot of research since years on the memory side. One is the memory stacked on the CPU / GPU transistors directly.

    Its extremely important for AMD to make advance thoses research, for Exascale, for their CPUs, for their APU and GPU's.
     
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,491
    Likes Received:
    909
    Aren't FinFETs (even) better at lower voltages? If so, it might make sense to keep clocks low and just increase the number of SM. SMM? SMP? Whatever they're called now.
     
  6. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    324
    Likes Received:
    84
    Finfet's have a great, but sharp, Fmax to voltage curve. The curve actually peaks higher than 28nm at less voltage, but is so sharp that both Nvidia and AMD will have little choice but to end up with similar frequencies. Go for too low of a frequency and you end up throwing far too much silicon at the problem for the performance you get. Which is what AMD seems to have demonstrated anyway with Polaris 10 at 850mhz, but that may have just been a special low power bin like the Nano was. Go the other way and try to clock too high, and you get exponential power draw.

    So the latitude the two companies had in 28nm, where Nvidia could flex its low power draw architecture to clock high for its lower end cards, is far less present for Finfets. Similarly AMDs cards like the 380/x that had relatively low clockspeeds but a lot of silicon are also less useful. Most likely the Polaris 10 demo was specifically shown just to hit similar performance and half the voltage of a Nvidia 950, which looks good for PR and at the same time leaves Nvidia guessing a bit as too how high Polaris 10 can actually scale and at what voltage, as Nvidia is using TSMC while AMD is using Glo-Flo/Samsung.

    Regardless, point is Nvidia's high end cards will almost certainly be clocked higher than last time. But yield problems are widespread for all foundries, meaning a straight doubling of resources from last time to take up the die shrink is practically impossible for this first round of cards.
     
    Razor1 and Alexko like this.
  7. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    How can yield still be such an issue when Intel has been running production on this node for a year now, and making finfets for 3ish years? When 14/16nm is partially based on a year+ old 20nm tech, it seems weird it's so hard to get up and running here...
     
  8. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    "Yield problems" is equivalent to "I really have no idea what I'm talking about but it sounds cool".
    Because that's exactly how it is: you just can't know unless it's mentioned in some company disclosure.
     
  9. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    There are plenty of sources out there saying that yield continues to worsen as nodes get smaller. Given that TSMC just doubled 16FF+ production this month, they are probably over the worst of it, or at least at the stage where it almost starts to make sense.

    We know the voltage of the Polaris part in the video vs the 950 was @850E/.8375v

    http://i.imgur.com/7q8kJkn.jpg

    AMD mentioned "frequency uplift" for Polaris in this video (around 35 seconds) -


    I expect 1.2GHz minimum on 14LPP for Polaris.
     
  10. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    Can't edit, but a point I once made was that Maxwell already had this frequency uplift baked in. I feel that Pascal should be compared to Kepler, not Maxwell, and there are no guarantees that Nvidia will increase frequency by much, if at all, over Maxwell for the early Pascal GPUs.
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,418
    Likes Received:
    178
    Location:
    Chania
  12. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    No, that's just as much of a empty statement as the one from Mr. Pony. There were times were yields on 180nm were much worse than today's yield on 16nm. I can make that statement for any process combination that has ever existed by simply choosing different sample points of their respective process maturity.

    Doom predictions have been around since the early nineties. They're often very fancy slides with curves that show impeding doom. I fondly remember the time the jump to sub-1um was considered a pretty major hurdle. We were very concerned with 130nm but somehow got that to work just fine as well.

    And there's no question that the technology is insanely complex with more process steps and mask layers than one could ever imagine back then. But problems get solved and yields eventually end up being fine. We will bump into some major physical limits within a decade, but 16nm is not where that is.

    16nm is a close derivative of 20nm, which is quite old now. Apple has been producing something like a hundred million dies using it. If you can mass produce that many at 100mm2, you can produce 500mm2 and everything in between just as well if you take the right redundancy precautions. And at some point, the cost curve starts to go down fast because of competition and because process matures. But just yelling "yield problems" is meaningless and provides zero insight when you don't know the exact status of the process today.

    When a company that is religious about high margins introduces a $400 'low cost' version with the same silicon as their high end, it's a safe bet that have crossed that point a long time ago.
     
    Ethatron, MDolenc, Lightman and 2 others like this.
  13. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    That makes no sense at all. The process is much faster. Even if Nvidia doesn't change one logic gate on their Maxwell design, that means the same design on 16nm will be much faster. A base clock of 1.4GHz should be the minimum we can expect.

    It's up to AMD to do the work that Nvidia did for Maxwell and revise GCN to remove whatever critical paths they currently have. (If they still have a math pipeline that's only 4 deep, that's probably a good place to start.) I have no doubt that AMD has done exactly that. Their architecture group must have done something in last 5 years.
     
    Grall likes this.
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The crux of the matter, at least with regards to a discrete graphics architecture, is whether there is sufficient mass to be produced. The breakeven points for ICs have been described as increasing, and the numbers I've seen bandied about for 500mm2 class GPU volumes seem to be missing zeroes relative to Apple.

    The following may not be fully applicable, but I figured I'd use it as a possible proxy in this discussion. Other reports and projections at least seem to agree that up-front costs are increasing.

    http://semico.com/content/soc-silic...sis-how-rising-costs-impact-soc-design-starts

    A possible interpretation of the graphics roadmap is that the demand and supportable pricing for the top segment may be insufficient to pay back the cost of designing implementations as complex as a top-end gaming/professional product at later nodes unless the silicon can be used across more strata.

    Whether that is necessarily relevant to the Pascal thread may be less certain. One discrete graphics vendor is decidedly less successful in these terms, and could be crossing the threshold more quickly.

    (edit: And then there is the ongoing cost of maintaining the software and infrastructure for unique implementations.)
     
    #914 3dilettante, Mar 23, 2016
    Last edited: Mar 23, 2016
  15. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    So why didn't Nvidia do the transistor level optimizations that you talked about earlier with Kepler on 28nm? They were only able to do so with Maxwell due to the 2-year maturity of the node.
     
  16. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    If I ever said transistor level optimizations, I'd love to know the context. Because I've always stated that Maxwell had major known architectural power improvements. And they must have done a lot of low level optimizations as well (such a clock gating) but none of those are transistor level.

    As for why they didn't do it for Kepler: they were already blasted in some corners for Kepler being a whopping 2 months late compared to GCN. Did you want them to be 2 years late?
     
  17. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I think Nvidia's financials speak for themselves. AMD's would too if they'd just up their volumes by a little bit.

    It's inevitable that Nvidia has a better business case to make for silicon versions than AMD. But AMD should still be fine. Their problem may be more one of manpower than of market size.
     
  18. Adored

    Newcomer

    Joined:
    Mar 1, 2016
    Messages:
    67
    Likes Received:
    4
    I thought you mentioned transistor level optimizations but perhaps I was mistaken.

    Anyway, Ryan Smith at Anandtech did.

    http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell

    http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3

    Now the real question is just how much of that can be attributed to the clock speed increase from Kepler to Maxwell? Other questions might be as FreneticPony suggested - Will FinFet's top-out at similar frequencies to these anyway? There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe. ;)
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I was trying to divine the meaning behind the statement concerning the economics of the smaller die being better in the PC Perspective interview with Raja Koduri.

    Not opting to pay for expanded staffing for the additional implementations could be an indirectly economic decision, I suppose.
    There could be other constraints more relevant to AMD and less so for this thread, such as the HPC APU that appears to be using an MCM that forces silicon targeting HPC to fight for socket/package space with a 16-core server CPU, which would carve part of the space out from under a reticle-consuming discrete.
     
  20. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    looking at it from the same node for both those architectures, nV was able to optimize for Maxwell. but we do know DP takes up quite a bit of space which was utilized for SP shader array for Maxwell.

    Architecture has a bit to do with clock speed, but there are ceilings for the process itself, and there is an optimal range of fequency vs. voltage vs power usage too but that to is based on design.

    Why could we not expect nV to improve on their design on 16 nm? What is stopping them from doing so? It seems they have been able to get a lot of 28nm much more than AMD, so their design seems to be better suited for higher frequencies, but if we start looking at power consumption scaling at those higher frequencies, we see that there is a quite a bit of jump after a certain amount, the same thing goes from AMD's GPU's.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...