Nvidia Pascal Speculation Thread

LordEC911 · Jan 14, 2016

Benetanegia said:
Good to know, thanks.

That's some weird stuff tho.

Also when you say some AB chips had A1 in it and that sone Titan review cards were AB chips, are you referencing the same cards/chips or did some Titans actually had AB printed on them?

http://imgur.com/a/NKzaH#4
Bunch of pictures of the different silicon designations of GK110.

There is the actual silicon designation that is, I think, etched onto the die. That has the date code and silicon revision.
The chip designation and bin, is the silk screened portion that is added afterwards. This is done for consumer chips, I believe, and usually not found on early samples.

The chip I was talking about is the picture with the ruler measuring it.
1236AB but has GK110-400-A1 silkscreened on it.

silent_guy · Jan 14, 2016

Razor1 said:
I was under the impression cuDNN updates aren't always updateable as in drop in updates and they were tailoring cuDNN per architecture.

I believe there are some minor API backward incompatibilities between cuDNN v3 and v4, and later cuDNN versions have dropped Fermi support, and some functionality requires compute model 3.5 (gk110 and later, so not gk104), but that's all I'm aware about.

There's no question that some newer functionality will eventually only work on later architectures, but it's not the case at all that the developer needs to concern itself about low level architectures.

silent_guy · Jan 14, 2016

ToTTenTranz said:
Not on every situation. Definitely not for the specific Maxwell->Pascal transition, as you would probably agree.

On the contrary: if GP104 retains a 256 bit bus, why would you no try to keep things as much the same as possible?

Their pinout can be pretty much identical, because the number of memory pins and PCIe pins (which account for the vast majority of the functional pins) stays the same. Moving pins around would add work, not reduce it.

Okay, same PCB, different chip with different substrate: possible (never said it wasn't, just that it would be a hassle).

IMO it'd be the opposite of a hassle. It'd save time.

Doesn't alter the fact that Pascal is supposed to be much more energy efficient, so a GM204's PCB for a GP104 die would most probably be an overkill for power delivery.

Of course, it will be more power efficient, but what makes you think that it will be less power?

Not to mention that if you're using MOSFETs to deliver power to much lower values than their typical rates, you're probably going off from their best efficiency curves. So not only are you paying more for unneeded power, you're also spending more power than you would need.

You can find pin compatible FETs and power regulators for every sweet spot.

All this just for reusing 2 year-old PCB design? Again, it's unthinkable.

I don't expect that they will make an exact copy of the previous PCB, BTW. There are always things that are dropped due to lack of time, power technologies that improve etc. Moving around small discrete components is easy. Even between the GTX 980 Ti and the Titan X there are PCB differences.

But the hard part is the stuff related to high speed signals, memory etc. Optimizing ball assignments just so that you have optimal matching trace lengths for all bits of a bit. Assigning them just so that you can get away with the lowest amount of PCB vias or PCB layers. This is exactly where you'd try to stick to what's known to work.

You make it sound as if keeping thing largely pin compatible is a terrible chore or an architectural straight-jacket. It's not. Changing a pinout is much harder than keeping something that already works.

CarstenS · Jan 14, 2016

Benetanegia said:
At 3-3.5 TFlops each (4 TFlops if they are excluding Tegra GPUs from the equation), if they are GM204-sized Pascals built on 16FF+, I'd argue that no HBM would be the smallest of Nvidia's problems.

Werent't you arguing earlier, that those indeed are Pascal testing vehicles?

silent_guy said:
On the contrary: if GP104 retains a 256 bit bus, why would you no try to keep things as much the same as possible?
Their pinout can be pretty much identical, because the number of memory pins and PCIe pins (which account for the vast majority of the functional pins) stays the same. Moving pins around would add work, not reduce it..

Probably* that would also mean, GP104 would not even support GDDR5X memory.

*I did not yet check for the necessary pin-out from the memory controllers, but GDDR5X memory chips have 190 instead of 170 balls in GDDR5. So they just might be driveable by the same pins from a GPU, but I doubt it.

Kaotik · Jan 14, 2016

The manufacturing date should tell clearly enough it's Maxwell, not Pascal, there's really no way NVIDIA had Pascals produced in January 2015

Deleted member 13524 · Jan 14, 2016

silent_guy said:
On the contrary: if GP104 retains a 256 bit bus, why would you no try to keep things as much the same as possible?

silent_guy said:
I don't expect that they will make an exact copy of the previous PCB, BTW. There are always things that are dropped due to lack of time, power technologies that improve etc.

Benetanegia · Jan 14, 2016

CarstenS said:
Werent't you arguing earlier, that those indeed are Pascal testing vehicles?

Test vehicles made on 28nm a year earlier so that the software ecosystem, based on Pascal's special deep learning features, is mature by the time it launches, and they wouldn't be GP104, but GP106 which would explain a lot of things:

1) Performance. It could end up really close to GM204, because I'd think the most obvious option is to basically double a GM206 in terms of execution units. That would end up with a chip with specs very close to GM204.

2) Use of GDDR5. Why use more expensive memory solutions with supply contraints if a 256-bit GDDR5 interface is enough for the job? It works for GM204. And 256-bit isn't a problem for neither a 400mm2, nor a 200-240mm2 chip.

3) Die size, TDP... Use of almost identical MXM. Remember it'd be a test vehicle for software development, not hardware testing. This version of the chip is not meant to compete in the market. If the chip can work "as is" on the same PCB, and reaching a competitive performance-per watt-per $ is not an issue in a chip that will never see a consumer market, why redesign the PCB at all? The final GM206 module could be entirely different.

4) Reason for the existence of such a chip to begin with. Didn't Huang said that prodcution of Drive PX 2 would start at the end of the year? That would make it available in Q1 2016? By contrast the original PX and CX were announced in March 2015 and made available a quarter later, maybe 2. This timeframe strenghtens the idea that it is a chip that didn't tape out yet, or it did very recently, while GP100 and GP104 reportedly taped out several months ago. And if it is GP106 that is going to be used, the release dates mostly match, assuming that GP104 launches late Q2, or Q3 and GP100 maybe in Q4, it makes a lot of sense that GP106 would be ready a quarter later. Historically that's how Nvidia has released their chips.

And lastly, I've never said that I strongly believe, or believe at all, that it is anything but a 980M. What I've been arguing is that people were jumping to conclusions (i.e. Pascal is in trouble) with excessive certainty, based on evidence that is not as certain as it was made out to be. Namely:

1) Almost identical PCB leaves no other option but for it to be 980M. I argued that a 400mm2-ish 28nm Pascal with similar TDP could probably use the same PCB, especially if it's a test vehicle that doesn't need to be competitive in any of the usual metrics. Discussion ensued because I was initially told that's literally imposible, and I "knew" it wasn't and from silent_guy's convo with ToTTenTranz, we can see that it's actually posible, maybe even a cheaper solution. And I don't think he's even thinking about the kind of test chip I've mentioned above, because he doesn't even believe it's a posibility. I'd think that the kind of non-competitive test chip that I'm talkng about would be much more likely to be able to reuse GM204's PCB, than a chip that needs to be competitive.

2) Imposible to be 16FF+ and hence Pascal, due to the dates etched in the die. From the convo with Rys we can see that it could be a 16FF+, even if it's highly unlikely. Again I challenged the idea that it is imposible for it to be 16FF+, I'm not arguing that it actually is. IF those where 16FF+ Pascals, who says those chips are functional at all? Did Huang said, I'm holding fully functional, production ready, Pascal GPUs? They could be risk produced, totally non-functional A0 (ABs, whatever) for all we know, they would still be Pascal no? They could be anything actually. They could be GP104 even if those were not going to be the chips being used in the Drive PX2.

Razor1 · Jan 14, 2016

silent_guy said:
I believe there are some minor API backward incompatibilities between cuDNN v3 and v4, and later cuDNN versions have dropped Fermi support, and some functionality requires compute model 3.5 (gk110 and later, so not gk104), but that's all I'm aware about.

There's no question that some newer functionality will eventually only work on later architectures, but it's not the case at all that the developer needs to concern itself about low level architectures.

Cool thx for the link, looks like there quite a significant increase in performance for just updating the libraries, I wasn't aware of that.

Rys · Jan 14, 2016

Benetanegia said:
From the convo with Rys we can see that it could be a 16FF+, even if it's highly unlikely.

That's not what I said. I don't believe A1 is first silicon for NV, so it's not 16FF+ even if it is Pascal. Let's just put that part of the discussion to bed now until NV themselves prove it one way or another.

Razor1 · Jan 14, 2016

Benetanegia said:
Test vehicles made on 28nm a year earlier so that the software ecosystem, based on Pascal's special deep learning features, is mature by the time it launches, and they wouldn't be GP104, but GP106 which would explain a lot of things:

1) Performance. It could end up really close to GM204, because I'd think the most obvious option is to basically double a GM206 in terms of execution units. That would end up with a chip with specs very close to GM204.

The cost for such a chip will be very high, they would be looking at ~30 million or so for just a prototype chip that would never make ROI, well then double that for the chip they would be releasing. Since there aren't any other companies that are doing this type of work for self driving vehicles that I can think of, would it be prudent for them to make it in such a way? It would just be a waste of money.

2) Use of GDDR5. Why use more expensive memory solutions with supply contraints if a 256-bit GDDR5 interface is enough for the job? It works for GM204. And 256-bit isn't a problem for neither a 400mm2, nor a 200-240mm2 chip.

Everything they have said for Drive - PX2 and deep learning doesn't seem to point to this. The need for higher bandwidth is something neuronets need, it isn't something that is an extra.

3) Die size, TDP... Use of almost identical MXM. Remember it'd be a test vehicle for software development, not hardware testing. This version of the chip is not meant to compete in the market. If the chip can work "as is" on the same PCB, and reaching a competitive performance-per watt-per $ is not an issue in a chip that will never see a consumer market, why redesign the PCB at all? The final GM206 module could be entirely different.

As others have said all of this can be done through simulations.

silent_guy · Jan 14, 2016

CarstenS said:
Probably* that would also mean, GP104 would not even support GDDR5X memory.

*I did not yet check for the necessary pin-out from the memory controllers, but GDDR5X memory chips have 190 instead of 170 balls in GDDR5. So they just might be driveable by the same pins from a GPU, but I doubt it.

The slide that I've seen talk about a command protocol that's very similar to GDDR5, so I don't expect more than a pin or 2 difference. There's no reason to increase pins on the address pins, and the data pins.

The amount of balls on the chips could be just additional power supply pairs, which is something you'd do when you increase data rates.

But I haven't seen more than just one or 2 slides on GDDR5X. Do you have a link to share?

Benetanegia · Jan 14, 2016

Rys said:
That's not what I said. I don't believe A1 is first silicon for NV, so it's not 16FF+ even if it is Pascal. Let's just put that part of the discussion to bed now until NV themselves prove it one way or another.

For the record I wasn't claiming you said it is posible, but form what you said there is time for one and only one revision of a chip to be made.

Now, you don't believe A1 to be first silicon and based on LordEC911's contributions neither do I. Still there's the issue with GK104 and GK107 being A2's and CharlieD's claim that Nvidia calls first silicon A1, so that is far from being resolved and it appears that at least at some point that was in fact the case.

The direct question would be, IF A1 was what Nvidia uses for first silicon and GK110 AB was just an anomaly (maybe derived from Nvidia's need to supply Oak Ridge), would it be posible for it to be 16FF+?

silent_guy · Jan 14, 2016

ToTTenTranz said:

You seems to be fixated on the PCB being exactly the same and then argue about the terrible ordeal it would be to make a chip pin compatible. I just want to dispel the latter. I don't give a f about the PCB being an exact copy. (Even though it could easily be done.)

CarstenS · Jan 14, 2016

silent_guy said:
The slide that I've seen talk about a command protocol that's very similar to GDDR5, so I don't expect more than a pin or 2 difference. There's no reason to increase pins on the address pins, and the data pins.

The amount of balls on the chips could be just additional power supply pairs, which is something you'd do when you increase data rates.

But I haven't seen more than just one or 2 slides on GDDR5X. Do you have a link to share?

It's available on the JEDEC site - you just have to register for free.
http://www.jedec.org/standards-documents/results/GDDR5X

Benetanegia · Jan 14, 2016

Razor1 said:
The cost for such a chip will be very high, they would be looking at ~30 million or so for just a prototype chip that would never make ROI, well then double that for the chip they would be releasing. Since there aren't any other companies that are doing this type of work for self driving vehicles that I can think of, would it be prudent for them to make it in such a way? It would just be a waste of money.

IMO it wouldn't be a waste. We are talking about a market where they could make billions in the long run. The Drive PX2 will not be vailable for over a year, many competitors could show up by then, and if these competitors happen to be the likes of Qualcomm, Intel, etc, it is probably game over for Nvidia in yet another emerging market.

We're talking about a company that spent $10M creating the Shield portable, and who knows how much in subsequent Tegra based gaming devices and what's the ROI on those? They didn't do it for the ROI, they did it in a bid to create and take a hold of a market with great potential.

A company that spent $367M in Icera, when it was obvious it would never makee any ROI. They did it for the potential of taking a hold of a lucrative market.

A company which was supposed to have been developing a phone, etc, etc. $30M against potentially billions in return over the coming years is a bet I would call in the blink of an eye. Especially if I had plenty of cash in hand right now vs what I could have in the future. The automotive market is anything but secured for Nvidia and considering it's kinda the last haven for one of their divisions, I'd say "nothing" is too much in order to try to secure it. Lacking commitment when you should be commiting could be the worst mistake you ever made.

And remember this 28nm chip didn't stop being just an speculation excersise on my part. Nothing has changed there, I'm just justifying why that hypothetical move would actually make sense.

As others have said all of this can be done through simulations.

If everyinthg can be done through simulations, then why are development kits created at all? See, I have a hard time believing that having actual hardware with all the capabilities would not be beneficial at all to third party library creators and software developers alike, when be it mobile platforms, consoles or Jetsons, for all of them development kits are created and released. Yes, those are cheaper to make, but so are the potential gains they produce.

3dilettante · Jan 14, 2016

Benetanegia said:
We're talking about a company that spent $10M creating the Shield portable, and who knows how much in subsequent Tegra based gaming devices and what's the ROI on those? They didn't do it for the ROI, they did it in a bid to create and take a hold of a market with great potential.

Shield is an example of where even when a design has a rather speculative upside (aside from not making the CEO a liar when he told investors Nvidia had a console using their IP), there's still an attempt to productize it.
If the belief is that they can at least generate some revenue, it helps keep the pathfinder silicon from eating into the budget at the expense of the "real" one.
That's not to say that there aren't examples of projects that never really come to market, but if money can be made it's usually attempted.

Is there something about Pascal that would make it uncompetitive with extant 28nm products? If it's an improved architecture ready for 28nm deployment, why couldn't it be made into a mildly better product to replace some of the SKUs out there?

A company that spent $367M in Icera, when it was obvious it would never makee any ROI. They did it for the potential of taking a hold of a lucrative market.

Is getting a hold of a new market that makes money not a return on investment?

silent_guy · Jan 14, 2016

CarstenS said:
It's available on the JEDEC site - you just have to register for free.
http://www.jedec.org/standards-documents/results/GDDR5X

Awesome!!! I didn't know JEDEC specs were downloadable for free.

I compared the signal description table of the GDDR5X (Table 57) with the signal description table of GDDR5 (Table 65). Pins are identical, except for the former allowing for a few more address pins (supporting large memories), and GDDR5X dropping the CS_n pin. So it's trivial to make GPU pinouts backward compatible.

The difference between the 190 and 170 ball package is indeed due to increased VDD/VSS pairs.

Also: GDDR5X is QDR instead of DDR.

Benetanegia · Jan 14, 2016

3dilettante said:
Shield is an example of where even when a design has a rather speculative upside (aside from not making the CEO a liar when he told investors Nvidia had a console using their IP), there's still an attempt to productize it.
If the belief is that they can at least generate some revenue, it helps keep the pathfinder silicon from eating into the budget at the expense of the "real" one.

I searched it, Phoenix platform was the phone I was talking about, what other function did it perform other than the one I'm claiming? Did they think about actually releasing a phone or is it, as it was said, just something to make life easier for potential customes? I believe it was the latter. It shows at the very least, commitment, which in my mind could help snatch partners such as Volvo.

That's not to say that there aren't examples of projects that never really come to market, but if money can be made it's usually attempted.

Is there something about Pascal that would make it uncompetitive with extant 28nm products? If it's an improved architecture ready for 28nm deployment, why couldn't it be made into a mildly better product to replace some of the SKUs out there?

That was one of my crazy ideas, which was rapidly discarded as idiotic. It was the posibility of a 750 Ti sorta card to be released earlier this year, based on the hypothetical chip I've proposed. 16FF+ midrange cards (GP106) is probably a year away. I don't know why it was so idiotic to think that maybe they made it 28nm initially and shrinked to 16FF+ later, but apparently it was and it was discarded rapidly. Of course, that left me with this hypothetical test chip as my only option in the discussion...

Speaking of the 750 Ti. Wasn't that clearly a backup plan Nvidia had up their sleeves in case 20nm didn't deliver or was delayed? I don't think there was time to create a chip out of the blue, between when it became clear 20nm wold fail and when it released. AMD certainly didn't have anything, or it took them much longer. Is it so unthinkable that they had another chip with the same purpose this time around?

Is getting a hold of a new market that makes money not a return on investment?

Well, that's what I'm being told...

Razor1 · Jan 14, 2016

Benetanegia said:
IMO it wouldn't be a waste. We are talking about a market where they could make billions in the long run. The Drive PX2 will not be vailable for over a year, many competitors could show up by then, and if these competitors happen to be the likes of Qualcomm, Intel, etc, it is probably game over for Nvidia in yet another emerging market.

We're talking about a company that spent $10M creating the Shield portable, and who knows how much in subsequent Tegra based gaming devices and what's the ROI on those? They didn't do it for the ROI, they did it in a bid to create and take a hold of a market with great potential.

Shield was made once nV noticed that cell phone and tablets will not uptake tegra because of the delays. Shield is an ROI and viability of the tegra line in the consumer market.

A company that spent $367M in Icera, when it was obvious it would never makee any ROI. They did it for the potential of taking a hold of a lucrative market.

The IP that Icera has/had was and is very valuable to Intel, and that is why they spent so much money on acquiring it. This is for current products and products in the future, making something to create a new market is different than a market that is already there and can be integrated into an existing business model.

A company which was supposed to have been developing a phone, etc, etc. $30M against potentially billions in return over the coming years is a bet I would call in the blink of an eye. Especially if I had plenty of cash in hand right now vs what I could have in the future. The automotive market is anything but secured for Nvidia and considering it's kinda the last haven for one of their divisions, I'd say "nothing" is too much in order to try to secure it. Lacking commitment when you should be commiting could be the worst mistake you ever made.

The players in the automotive industry are a bit slower that the general consumer side of things. This is because of regulations and time span of vehicles.

Also contracts with automotive companies are likely done for a much longer period of time then system or OEM for computers, automotive companies will be much more through in picking the right product for their cars.

And remember this 28nm chip didn't stop being just an speculation excersise on my part. Nothing has changed there, I'm just justifying why that hypothetical move would actually make sense.

If everyinthg can be done through simulations, then why are development kits created at all? See, I have a hard time believing that having actual hardware with all the capabilities would not be beneficial at all to third party library creators and software developers alike, when be it mobile platforms, consoles or Jetsons, for all of them development kits are created and released. Yes, those are cheaper to make, but so are the potential gains they produce.

nV has both areas covered, hardware and software. The only thing the automotive companies to worry about is the interface with the hardware and software. nV will not be making custom parts for different manufactures. This is what makes simulations work well in a situation like this.

And this is what makes it hard for others to come into this market, because they don't have the software yet. Look at what has happened with the the compute market.... has Intel or AMD or Qualcomm or any other company made such an impact as nV? By the time these other companies have seen what nV did, it was too late, and nV still has a upper hand because of its software.

3dilettante · Jan 14, 2016

Benetanegia said:
I searched it, Phoenix platform was the phone I was talking about, what other function did it perform other than the one I'm claiming? Did they think about actually releasing a phone or is it, as it was said, just something to make life easier for potential customes? I believe it was the latter. It shows at the very least, commitment, which in my mind could help snatch partners such as Volvo.

It appears like there was a desire to use that platform to help facilitate sales of the silicon to partners.
The test chip scenario involves silicon that is never sold, even though should it exist there doesn't seem to be a reason why it wouldn't be an improved product for sale.

That was one of my crazy ideas, which was rapidly discarded as idiotic. It was the posibility of a 750 Ti sorta card to be released earlier this year, based on the hypothetical chip I've proposed. 16FF+ midrange cards (GP106) is probably a year away. I don't know why it was so idiotic to think that maybe they made it 28nm initially and shrinked to 16FF+ later, but apparently it was and it was discarded rapidly.

There are examples of architectures at one node being shifted to a new node. It happened with AMD's Bulldozer, but that wasn't something they intended to do and usually happens due to problems--as that CPU demonstrated.

Nvidia Pascal Speculation Thread

LordEC911

silent_guy

silent_guy

CarstenS

Moderator

Kaotik

Drunk Member

Deleted member 13524

Guest

Benetanegia

Razor1

Rys

Graphics @ AMD

Razor1

silent_guy

Benetanegia

silent_guy

CarstenS

Moderator

Benetanegia

3dilettante

silent_guy

Benetanegia

Razor1

3dilettante

Similar threads