Nvidia Pascal Speculation Thread

Status
Not open for further replies.
Isn't the news that NVidia will have HBM2 up and running before the end of this year?

We already knew that Samsung was planning to make HBM2:

https://forum.beyond3d.com/posts/1867707/
How does "Samsung starts mass producing early 2016" turn into "up and running before the end of this year"? Of course there will be samples earlier, hell, they have to have them already to do any tests on the chips, but that's a far cry from anything relevant.
 
Up and running before the end of the year could imply Pascal with HBM2 before summer, e.g. May. Without that tidbit, one might be biased towards thinking it's coming in autumn 2016.

People seem to be assuming that GP100 comes first, but I'm dubious that the largest Pascal with HBM2 will come first.

Fiji was up and running in about September/October 2014 (Sisoft website benchmark results) and released in June 2015...
 
The past two generations both debuted with a smaller part, with the Big chip coming 10-12 months later.

Kepler launched April 2012 with GK104. GK110 wasn't until February 2013 with GTX Titan. (Or November 2012 if you count Tesla K20)

Maxwell launched February 2014 with GM107. GM204 was September 2014. GM200 wasn't until March 2015 with Titan X.

Fermi did launch with GF100 though.
 
I think, HBM(2) would make the most sense on bigger chips just as AMD has opted for too. And probably, there'll be not infinite amounts of samples initially, so a company deciding to bet their bread and butter GPU on it is in for quite a risk.
 
Two HBM2 stacks for 8GB of memory on GP104 with a die size of roughly 300mm² at about 150W power consumption seems like an achievable target with the first implementation of HBM at NVidia, before summer 2016. Delivering Titan X+ performance.

The supercomputing crowd will wait for as long as it takes to get GP100, they aren't going to buy anything else. So if it's autumn 2016 or spring 2017 it's kinda immaterial - NVidia will demonstrate it at SC2016.

NVidia's external risk should be pretty minimal 1 year after HBM introduction, since both HBM2 (TSV stacking is the trickiest bit, the logic is trivial) and interposer (mating to other devices is the trickiest bit) tech will have matured by then. The unknown for NVidia is getting its own chips working, since its suppliers will have solved all the other problems.
 
The supercomputing crowd will wait for as long as it takes to get GP100, they aren't going to buy anything else. So if it's autumn 2016 or spring 2017 it's kinda immaterial - NVidia will demonstrate it at SC2016.
That sounds like a reason to ship GP100 as early as possible because it is also delaying their own revenue and giving Intel a chance with Xeon Phi. Not many people are going to commit to a large-scale Kepler installation if they believe Pascal is around the corner.

I'm still betting that this is the generation where NVIDIA stops making "HPC+GPU" ultra-high-end chips and make truly dedicated HPC chips without a rasteriser - GK210 is already a big step in that direction and it makes sense to go all the way (IMO). I'm half-expecting something closer to Fermi where the big GPU tapes-out first, although given the longer certification times the smaller desktop ones might still be publicly available first.
 
People seem to be assuming that GP100 comes first, but I'm dubious that the largest Pascal with HBM2 will come first.

There are solid business reasons to lead with the top-end chip with good FP64, both in the professional market and elsewhere. Titan won't sell as much if there is a solid midrange card below it.

The supercomputing crowd will wait for as long as it takes to get GP100, they aren't going to buy anything else.

In the past this was absolutely true. However, now there is real competition in the market in the form of AMD and Intel products, and OpenCL no longer sucks enough to be a total non-starter. Most of the customers still want to buy nV because they are so heavily invested in CUDA, but if they have to wait too long they might be tempted to go the other way. And if they rewrite their stuff in OpenCL, it permanently erases the CUDA advantage nV enjoys for each client that does so, so even if nV comes around with a new, better product, they might not be able to ask as much money for it.
 
That sounds like a reason to ship GP100 as early as possible because it is also delaying their own revenue and giving Intel a chance with Xeon Phi. Not many people are going to commit to a large-scale Kepler installation if they believe Pascal is around the corner.

I'm still betting that this is the generation where NVIDIA stops making "HPC+GPU" ultra-high-end chips and make truly dedicated HPC chips without a rasteriser - GK210 is already a big step in that direction and it makes sense to go all the way (IMO). I'm half-expecting something closer to Fermi where the big GPU tapes-out first, although given the longer certification times the smaller desktop ones might still be publicly available first.

No one cares about Xeon Phi that I know of, and bigger chips take longer to make so they come out later. Not too mention they need a mature process unless you want to waste a lot of money on chips with a lot of defects.

So unless Nvidia is planning on eating its own profits in order to gain market share (possible but probably not) GP100 or Big Pascal or whatever will all but certainly take some time to actually be released. They could always do a paper launch and then ramp up to actual production, but that doesn't seem particularly better.
 
No one cares about Xeon Phi that I know of, and bigger chips take longer to make so they come out later. Not too mention they need a mature process unless you want to waste a lot of money on chips with a lot of defects.
I don't know where the 'longer to make' comes from, but I see no reason why they couldn't do a big die Pascal first:
- Apple is already mass producing 16nm in volumes that dwarf those GPUs.
- Even if defect densities are still a bit high, that can easily be solved by disabling some units. It really does miracles to yields. And with the number of shader units going up steadily, the performance cost to disable one, two or more of them become progressively smaller.
- The asking price for a GPU with a big die FP64 enabled Pascal with HBM2 can be very high. A Tesla K80 goes for $4200 on Amazon right now. A Tesla Pxxx with 32GB should easily go higher than that. At that point, it matters little if the die itself cost $200, $400, or $600 due to lower yields. GK110 arrived as a GeForce GPU quite a bit longer after the Tesla part. Nvidia can do the same here as well.

With Maxwell skipping FP64, there should be some pressure to do a big die before the smaller ones. Kepler for Tesla is really getting long in the tooth...
 
The past two generations both debuted with a smaller part, with the Big chip coming 10-12 months later.
Except Intel is eating small, and now also mid-sized GPUs' lunch with their recent integrated graphics, especially the Iris editions with the eDRAM chip. Profits from these GPUs thusly ought to be shrinking, while big chips drive excitement and are good PR (unless you're AMD and fuck up.)
 
No one cares about Xeon Phi that I know of, and bigger chips take longer to make so they come out later. Not too mention they need a mature process unless you want to waste a lot of money on chips with a lot of defects.

So unless Nvidia is planning on eating its own profits in order to gain market share (possible but probably not) GP100 or Big Pascal or whatever will all but certainly take some time to actually be released. They could always do a paper launch and then ramp up to actual production, but that doesn't seem particularly better.


Larger chips don't take longer to produce, large chips might have less yields comparative to smaller chips but that doesn't mean longer to produce. Top end cards have enough margins usually to cover the lower yields of larger chips that is why they are priced higher.

How would they be eating their profits if they stop production of older chips in time for the newer chips to arrive?

nV's marketshare "growth" was more due to AMD being absent in the marketplace for a substantial amount of time from nV's launch of their new chips. AMD needs to rectify this, the last two gens have perceivable delays to the market.
 
I agree with silent_guy that the price of Teslas makes the low yield argument against starting with a big GPU less compelling. But aren't validation/testing requirements for these HPC targeted chips much more stringent than for consumer GPUs? I guess they can get around some of that by leading with single chip products like Intel does with Xeons, so that nvlink and multi-socket cache coherency (if that's supported) doesn't need to work right off the bat. There were also rumors about the Pascal shader architecture being the same as Maxwell, so that presumably helps.
 
I agree with silent_guy that the price of Teslas makes the low yield argument against starting with a big GPU less compelling. But aren't validation/testing requirements for these HPC targeted chips much more stringent than for consumer GPUs? I guess they can get around some of that by leading with single chip products like Intel does with Xeons, so that nvlink and multi-socket cache coherency (if that's supported) doesn't need to work right off the bat. There were also rumors about the Pascal shader architecture being the same as Maxwell, so that presumably helps.
I don't know where these rumors come from (like Pascal being a 16nm version of Maxwell), but from the information we have, I see Pascal as a big jump. nearly 2 nodes jump, different memory interface, different ALU with optimized 16/32/64fp mixed precision support, Nvlink, and the necessary secret features that will be disclosed at product introduction, it doesn't sound like a walk in the park...
 
I don't know where these rumors come from (like Pascal being a 16nm version of Maxwell), but from the information we have, I see Pascal as a big jump. nearly 2 nodes jump, different memory interface, different ALU with optimized 16/32/64fp mixed precision support, Nvlink, and the necessary secret features that will be disclosed at product introduction, it doesn't sound like a walk in the park...

And the exact same architecture pretty much. It's not a "rumor", Nvidia themselves say as much. It's not like Maxwell is a bad architecture, and dropping from 28nm to 16nm TSMC and bringing in HBM should provide plenty of incentive for even non compute customers to purchase Pascal chips. I simply don't understand the reasoning behind shouting that Pascal is a new GPU architecture and then providing, yourself, proof that it isn't.
 
mixed precision support can't be done with the current shader array structure on Maxwell. Its not something that is simple to just tack on either. ALU structure might be similar but everything feeding the array and storing of the data necessary for the array to function optimally (cache) will have to be different.
 
mixed precision support can't be done with the current shader array structure on Maxwell. Its not something that is simple to just tack on either. ALU structure might be similar but everything feeding the array and storing of the data necessary for the array to function optimally (cache) will have to be different.
One of the great gifts of hardware.fr is this page, where all recent SMs with its internal architecture are placed next to each other and you can immediately see the changes as you mouse over the different links. Compare a GK110 against a GK104 and it's obvious that, in broad strokes, there isn't much different in architecture. There is no reason to believe that something similar can't be done for the Maxwell SM. In other words, I do think that it's mostly a matter of tacking on more FP64 units.
I don't see why the cache should be in any way different? Neither do I see why the register file and the way the array is fed would have to be changed significantly. At best, FP64 will have half the performance of FP32, so the logical way to go about it is to simply use 2 adjacent 32-bit registers and fetch them sequentially. And if Pascal implements FP16 the way it's done Tegra X1 (see this Anandtech article), it doesn't require major architectural plumbing either.
 
Given that die area is, compared to power, an increasingly less important issue, I would think that with the jump to 16/14 nm process tech, companies can afford to invest a little more transistors in their multipliers and adders in order to unify the execution units (again).
 
Status
Not open for further replies.
Back
Top