Nvidia Pascal Speculation Thread

3dilettante · Mar 20, 2016

lanek said:
Fury is 596mm2 ( taken on Anandtech review )... But at least at 28nm, both nvidia and AMD have hit the reticule limit of TSMC.. I dont know what is the limit on 14-16nm. ( But will they really want to hit the bigger sized chip on the first series of this new process ? ... I dont think... )

Im not sure the interposer size is a limiter factor there.

For the purposes of manufacturing the GPUs, the limit is an optical one related to the equipment. It's not going to change much.
The interposer for Fiji did a few things to allow the whole assembly to exceed the reticle limit for the interposer. The interposer itself is larger, but the patterned area subject to optical limits does not cover the whole interposer. The GPU runs right up to the limit on one dimension, and the HBM stacks partially extend past the edge of the patterned region, taking some spacing pressure off of the GPU.

There are ways to expand the area of the interposer, with varying degrees of complication and risk.
Some of TSMC's planned products for 2.5D integration might be expanding the limit, and others might be doing so as well as they get past the early implementations of the concept.

CarstenS said:
Reticle size should stay the same for all intends and purposes. But I haven't read anywhere, why HBM2 should be significantly bigger, quite the contrary, they're within the same JEDEC spec, not even officially named HBM2.

What was called HBM2 is a separate revision of the spec, but it the package is defined to be larger than the early version of the memory used by Fiji.
http://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification

HBM2 memory stacks are not only faster and more capacious than HBM1 KGSDs, but they are also larger. SK Hynix’s HBM1 package has dimensions of 5.48 mm × 7.29 mm (39.94 mm2). The company’s HBM2 chip will have dimensions of 7.75 mm × 11.87 mm (91.99 mm2). Besides, HBM2 stacks will also be higher (0.695 mm/0.72 mm/0.745 mm vs. 0.49 mm) than HBM1 KGSDs, which may require developers of ASICs (e.g., GPUs) to install a heat-spreader on their SiPs to compensate for any differences in height between the memory stacks and GPU die, to protect the DRAM, and to guarantee sufficient cooling for high bandwidth memory.

lanek · Mar 21, 2016

Yes i remember have read something on the interposer size, that it could be larger than the optical limit. It was mostly constrained by the fab used ( remember that they was reuse old fab tools for reduct cost )

Frenetic Pony · Mar 22, 2016

The "leak" is hilariously wrong, just someone on the internet taking the old 28nm designs and doubling them because they heard that's what Pascal would be. Which would be a fantastically huge wast of Finfet designs when you can trade off a bit of die density for a huge boost in clockspeed.

I mostly wonder how much space is going to be taken up by the ideal 4 stacks of HBM for high end compute GPUs. The Fury X already had pretty big HBM stacks, and that was for 1. Ohwell, memory has to go somewhere, still I wonder if this size is part of the reason AMD has "next gen memory" listed for their next real architecture jump two years from now.

lanek · Mar 23, 2016

"Next Gen" memory could mean anything.. Including HBM evolutions ( lets call it HBM3 ). AMD is doing a lot of research since years on the memory side. One is the memory stacked on the CPU / GPU transistors directly.

Its extremely important for AMD to make advance thoses research, for Exascale, for their CPUs, for their APU and GPU's.

Alexko · Mar 23, 2016

Aren't FinFETs (even) better at lower voltages? If so, it might make sense to keep clocks low and just increase the number of SM. SMM? SMP? Whatever they're called now.

Frenetic Pony · Mar 23, 2016

Alexko said:
Aren't FinFETs (even) better at lower voltages? If so, it might make sense to keep clocks low and just increase the number of SM. SMM? SMP? Whatever they're called now.

Finfet's have a great, but sharp, Fmax to voltage curve. The curve actually peaks higher than 28nm at less voltage, but is so sharp that both Nvidia and AMD will have little choice but to end up with similar frequencies. Go for too low of a frequency and you end up throwing far too much silicon at the problem for the performance you get. Which is what AMD seems to have demonstrated anyway with Polaris 10 at 850mhz, but that may have just been a special low power bin like the Nano was. Go the other way and try to clock too high, and you get exponential power draw.

So the latitude the two companies had in 28nm, where Nvidia could flex its low power draw architecture to clock high for its lower end cards, is far less present for Finfets. Similarly AMDs cards like the 380/x that had relatively low clockspeeds but a lot of silicon are also less useful. Most likely the Polaris 10 demo was specifically shown just to hit similar performance and half the voltage of a Nvidia 950, which looks good for PR and at the same time leaves Nvidia guessing a bit as too how high Polaris 10 can actually scale and at what voltage, as Nvidia is using TSMC while AMD is using Glo-Flo/Samsung.

Regardless, point is Nvidia's high end cards will almost certainly be clocked higher than last time. But yield problems are widespread for all foundries, meaning a straight doubling of resources from last time to take up the die shrink is practically impossible for this first round of cards.

Grall · Mar 23, 2016

How can yield still be such an issue when Intel has been running production on this node for a year now, and making finfets for 3ish years? When 14/16nm is partially based on a year+ old 20nm tech, it seems weird it's so hard to get up and running here...

silent_guy · Mar 23, 2016

Grall said:
How can yield still be such an issue when Intel has been running production on this node for a year now, and making finfets for 3ish years? When 14/16nm is partially based on a year+ old 20nm tech, it seems weird it's so hard to get up and running here...

"Yield problems" is equivalent to "I really have no idea what I'm talking about but it sounds cool".
Because that's exactly how it is: you just can't know unless it's mentioned in some company disclosure.

Adored · Mar 23, 2016

There are plenty of sources out there saying that yield continues to worsen as nodes get smaller. Given that TSMC just doubled 16FF+ production this month, they are probably over the worst of it, or at least at the stage where it almost starts to make sense.

We know the voltage of the Polaris part in the video vs the 950 was @850E/.8375v

http://i.imgur.com/7q8kJkn.jpg

AMD mentioned "frequency uplift" for Polaris in this video (around 35 seconds) -

I expect 1.2GHz minimum on 14LPP for Polaris.

Adored · Mar 23, 2016

Can't edit, but a point I once made was that Maxwell already had this frequency uplift baked in. I feel that Pascal should be compared to Kepler, not Maxwell, and there are no guarantees that Nvidia will increase frequency by much, if at all, over Maxwell for the early Pascal GPUs.

Ailuros · Mar 23, 2016

Grall said:
How can yield still be such an issue when Intel has been running production on this node for a year now, and making finfets for 3ish years? When 14/16nm is partially based on a year+ old 20nm tech, it seems weird it's so hard to get up and running here...

Say hello to PAO:

http://www.anandtech.com/show/10183...ead-becomes-process-architecture-optimization

silent_guy · Mar 23, 2016

Adored said:
There are plenty of sources out there saying that yield continues to worsen as nodes get smaller.

No, that's just as much of a empty statement as the one from Mr. Pony. There were times were yields on 180nm were much worse than today's yield on 16nm. I can make that statement for any process combination that has ever existed by simply choosing different sample points of their respective process maturity.

Doom predictions have been around since the early nineties. They're often very fancy slides with curves that show impeding doom. I fondly remember the time the jump to sub-1um was considered a pretty major hurdle. We were very concerned with 130nm but somehow got that to work just fine as well.

And there's no question that the technology is insanely complex with more process steps and mask layers than one could ever imagine back then. But problems get solved and yields eventually end up being fine. We will bump into some major physical limits within a decade, but 16nm is not where that is.

16nm is a close derivative of 20nm, which is quite old now. Apple has been producing something like a hundred million dies using it. If you can mass produce that many at 100mm2, you can produce 500mm2 and everything in between just as well if you take the right redundancy precautions. And at some point, the cost curve starts to go down fast because of competition and because process matures. But just yelling "yield problems" is meaningless and provides zero insight when you don't know the exact status of the process today.

Given that TSMC just doubled 16FF+ production this month, they are probably over the worst of it, or at least at the stage where it almost starts to make sense.

When a company that is religious about high margins introduces a $400 'low cost' version with the same silicon as their high end, it's a safe bet that have crossed that point a long time ago.

silent_guy · Mar 23, 2016

Adored said:
Can't edit, but a point I once made was that Maxwell already had this frequency uplift baked in. I feel that Pascal should be compared to Kepler, not Maxwell, and there are no guarantees that Nvidia will increase frequency by much, if at all, over Maxwell for the early Pascal GPUs.

That makes no sense at all. The process is much faster. Even if Nvidia doesn't change one logic gate on their Maxwell design, that means the same design on 16nm will be much faster. A base clock of 1.4GHz should be the minimum we can expect.

It's up to AMD to do the work that Nvidia did for Maxwell and revise GCN to remove whatever critical paths they currently have. (If they still have a math pipeline that's only 4 deep, that's probably a good place to start.) I have no doubt that AMD has done exactly that. Their architecture group must have done something in last 5 years.

3dilettante · Mar 23, 2016

silent_guy said:
16nm is a close derivative of 20nm, which is quite old now. Apple has been producing something like a hundred million dies using it. If you can mass produce that many at 100mm2, you can produce 500mm2 and everything in between just as well if you take the right redundancy precautions. And at some point, the cost curve starts to go down fast because of competition and because process matures. But just yelling "yield problems" is meaningless and provides zero insight when you don't know the exact status of the process today.

The crux of the matter, at least with regards to a discrete graphics architecture, is whether there is sufficient mass to be produced. The breakeven points for ICs have been described as increasing, and the numbers I've seen bandied about for 500mm2 class GPU volumes seem to be missing zeroes relative to Apple.

The following may not be fully applicable, but I figured I'd use it as a possible proxy in this discussion. Other reports and projections at least seem to agree that up-front costs are increasing.

http://semico.com/content/soc-silic...sis-how-rising-costs-impact-soc-design-starts

14nm silicon with a $20.00 ASP is required to ship 9.954M units to reach the breakeven point.

The average design cost for all 1st effort SoC types for all effort levels and all process geometries is forecast to be $5.7M in 2014 with a 10.8% CAGR through 2018. Even though design costs are rising quickly at the advanced nodes, the number of low cost designs holds the average cost down.

A possible interpretation of the graphics roadmap is that the demand and supportable pricing for the top segment may be insufficient to pay back the cost of designing implementations as complex as a top-end gaming/professional product at later nodes unless the silicon can be used across more strata.

Whether that is necessarily relevant to the Pascal thread may be less certain. One discrete graphics vendor is decidedly less successful in these terms, and could be crossing the threshold more quickly.

(edit: And then there is the ongoing cost of maintaining the software and infrastructure for unique implementations.)

Adored · Mar 23, 2016

silent_guy said:
That makes no sense at all. The process is much faster. Even if Nvidia doesn't change one logic gate on their Maxwell design, that means the same design on 16nm will be much faster. A base clock of 1.4GHz should be the minimum we can expect.

So why didn't Nvidia do the transistor level optimizations that you talked about earlier with Kepler on 28nm? They were only able to do so with Maxwell due to the 2-year maturity of the node.

silent_guy · Mar 23, 2016

Adored said:
So why didn't Nvidia do the transistor level optimizations that you talked about earlier with Kepler on 28nm? They were only able to do so with Maxwell due to the 2-year maturity of the node.

If I ever said transistor level optimizations, I'd love to know the context. Because I've always stated that Maxwell had major known architectural power improvements. And they must have done a lot of low level optimizations as well (such a clock gating) but none of those are transistor level.

As for why they didn't do it for Kepler: they were already blasted in some corners for Kepler being a whopping 2 months late compared to GCN. Did you want them to be 2 years late?

silent_guy · Mar 23, 2016

3dilettante said:
The crux of the matter, at least with regards to a discrete graphics architecture, is whether there is sufficient mass to be produced. The breakeven points for ICs have been described as increasing, and the numbers I've seen bandied about for 500mm2 class GPU volumes seem to be missing zeroes relative to Apple.

I think Nvidia's financials speak for themselves. AMD's would too if they'd just up their volumes by a little bit.

Whether that is necessarily relevant to the Pascal thread may be less certain. One discrete graphics vendor is decidedly less successful in these terms, and could be crossing the threshold more quickly.

It's inevitable that Nvidia has a better business case to make for silicon versions than AMD. But AMD should still be fine. Their problem may be more one of manpower than of market size.

Adored · Mar 23, 2016

silent_guy said:
If I ever said transistor level optimizations, I'd love to know the context. Because I've always stated that Maxwell had major known architectural power improvements. And they must have done a lot of low level optimizations as well (such a clock gating) but none of those are transistor level.

As for why they didn't do it for Kepler: they were already blasted in some corners for Kepler being a whopping 2 months late compared to GCN. Did you want them to be 2 years late?

I thought you mentioned transistor level optimizations but perhaps I was mistaken.

Anyway, Ryan Smith at Anandtech did.

http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell

But there is still room for maneuvering within the 28nm node and to improve power and density within a design without changing the node itself. Maxwell in turn is just such a design, further optimizing the efficiency of NVIDIA’s designs within the confines of the 28nm node.

http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3

Finally there’s the lowest of low level optimizations, which is transistor level optimizations. Again NVIDIA hasn’t provided a ton of details here, but they tell us they’ve gone through at the transistor level to squeeze out additional energy efficiency as they could find it. Given that TSMC 28nm is now a very mature process with well understood abilities and quirks, NVIDIA should be able to design and build their circuits to a tighter tolerance now than they would have been able to when working on GK107 over 2 years ago.

Now the real question is just how much of that can be attributed to the clock speed increase from Kepler to Maxwell? Other questions might be as FreneticPony suggested - Will FinFet's top-out at similar frequencies to these anyway? There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe.

3dilettante · Mar 23, 2016

silent_guy said:
It's inevitable that Nvidia has a better business case to make for silicon versions than AMD. But AMD should still be fine. Their problem may be more one of manpower than of market size.

I was trying to divine the meaning behind the statement concerning the economics of the smaller die being better in the PC Perspective interview with Raja Koduri.

Not opting to pay for expanded staffing for the additional implementations could be an indirectly economic decision, I suppose.
There could be other constraints more relevant to AMD and less so for this thread, such as the HPC APU that appears to be using an MCM that forces silicon targeting HPC to fight for socket/package space with a 16-core server CPU, which would carve part of the space out from under a reticle-consuming discrete.

Razor1 · Mar 23, 2016

Adored said:
I thought you mentioned transistor level optimizations but perhaps I was mistaken.

Anyway, Ryan Smith at Anandtech did.

http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell

http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3

Now the real question is just how much of that can be attributed to the clock speed increase from Kepler to Maxwell? Other questions might be as FreneticPony suggested - Will FinFet's top-out at similar frequencies to these anyway? There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe.

looking at it from the same node for both those architectures, nV was able to optimize for Maxwell. but we do know DP takes up quite a bit of space which was utilized for SP shader array for Maxwell.

Architecture has a bit to do with clock speed, but there are ceilings for the process itself, and there is an optimal range of fequency vs. voltage vs power usage too but that to is based on design.

Why could we not expect nV to improve on their design on 16 nm? What is stopping them from doing so? It seems they have been able to get a lot of 28nm much more than AMD, so their design seems to be better suited for higher frequencies, but if we start looking at power consumption scaling at those higher frequencies, we see that there is a quite a bit of jump after a certain amount, the same thing goes from AMD's GPU's.

Nvidia Pascal Speculation Thread

3dilettante

lanek

Frenetic Pony

lanek

Alexko

Frenetic Pony

Grall

Invisible Member

silent_guy

Adored

Adored

Ailuros

Epsilon plus three

silent_guy

silent_guy

3dilettante

Adored

silent_guy

silent_guy

Adored

3dilettante

Razor1

Similar threads