Nvidia Pascal Speculation Thread

Benetanegia · Jan 13, 2016

silent_guy said:
When silicon comes back from the fab, everything has already been validated in simulation and emulation. Compilers have already been written. Programs have already been executed and tested for correctness.

Whatever these DL related capabilities may be, they're probably minor additions in the grand scheme of things. Nobody would ever consider spending $20M on a piece of silicon that will never see mass production to make sure that a minor feature really, really works.

Ah, no, I am not talking about Nvidia having something to test, I'm a noob but I know that would serve no purpose. I'm talking about select third parties having something to actually work on, I don't think and correct me if I'm wrong, but I don't think third parties have access to Nvidia's simulator, so that each of them can test evey single peace of software they develop. It's an expensive equipment that I'd guess that it is already very bussy simulating future GPUs and SoCs. That's what dev kits are for.

I'm talking about all/most deep learning libraries out there being fully updated to take advantage of the new capabilities and a software ecosystem having been developed around them, before the final product is ready early next year (they said production starts late this year iirc), instead of having it been developed in the months after the release and Nvidia hoping that would help make their solution and ecosystem the de facto "standard" of sorts, or at least a bigger plyer than they are. IMO $20M wouldn't be too high a price to pay considering the potential middle and long term gains. As I said they've been spending much more than that on the Shield family of products (what was it $10M for the original Shield portable alone? For a few thousands ever being produced?) and its only purpose is to demostrate and again hopefully building a gaming ecosystem around their Tegra, so that maybe that allows them to create high volume devices. Other than Nvidia who would consider spending tens of millions on consoles they know are not going to sell?

CarstenS · Jan 13, 2016

Another point on that one: If, with capital I and F, the PX2 shown really featured GM204-sized Pascals, it also means that the chances for HBM on that chip would be all but gone and leave HBM only for the High-End stuff that's probably not even due 2016.

Benetanegia · Jan 13, 2016

CarstenS said:
Another point on that one: If, with capital I and F, the PX2 shown really featured GM204-sized Pascals, it also means that the chances for HBM on that chip would be all but gone and leave HBM only for the High-End stuff that's probably not even due 2016.

At 3-3.5 TFlops each (4 TFlops if they are excluding Tegra GPUs from the equation), if they are GM204-sized Pascals built on 16FF+, I'd argue that no HBM would be the smallest of Nvidia's problems.

silent_guy · Jan 13, 2016

Benetanegia said:
I don't think third parties have access to Nvidia's simulator, so that each of them can test evey single peace of software they develop. It's an expensive equipment that I'd guess that it is already very bussy simulating future GPUs and SoCs. That's what dev kits are for.

Developers don't write assembly code. They'll use a library like cuDNN or Neon. They can write all their algorithms today and enjoy a speed bump a bit later.

Razor1 · Jan 13, 2016

silent_guy said:
Developers don't write assembly code. They'll use a library like cuDNN or Neon. They can write all their algorithms today and enjoy a speed bump a bit later.

Well you won't get all the performance boost, just for a using a new architecture still have to new code for the new architecture to get the most out of it.

But I don't think cars are going to be looking for absolute performance they are going to be looking for reliability that is a bit more important to them.

silent_guy · Jan 13, 2016

Razor1 said:
Well you won't get all the performance boost, just for a using a new architecture still have to new code for the new architecture to get the most out of it.

Why? cuDNN has a high level of abstraction with little or no architectural specifics exposed. So just replace the library. Done.

If the DL improvements are related to lower precision operations, and you really wanted to have that on the devkit as well, you could even emulate those lower precision instructions, at a larger performance penalty.

Razor1 · Jan 13, 2016

I was under the impression cuDNN updates aren't always updateable as in drop in updates and they were tailoring cuDNN per architecture.

Benetanegia · Jan 13, 2016

silent_guy said:
Developers don't write assembly code. They'll use a library like cuDNN or Neon. They can write all their algorithms today and enjoy a speed bump a bit later.

I'm not so sure about that. If the DL oriented intructions are something like SSE or AVX I don't think you could take advantage in all cases unless you specifically took it into account. But anyway, Neon is more towards what I was talking about. cuDNN is Nvidia's, so they would be able to take care of it regardless. But Neon is developed and mantained by a third party, I'm correct? How are they supposed to have access to the new instructions before the PX2 is produced and handed to them? Even assuming that devs using Neon would see an immediate speed bump without any work on their part, which I have a somewhat hard time believing, updating Neon itself would certainly take time. And it would take even longer for devs to start updating the libraries on their running projects. I don't think everyone just upgrades every project to a new library the moment this becomes available.

With every move that Nvidia has made with all things Tegra, the pattern that I've been seing is that Nvidia is playing "the long shot": https://www.cardschat.com/odds-for-dummies.php. They are "calling bets" where they have low probability of succeeding, because the odds are on their side, meaning that a winning hand would recoup every "bet" they lost and then some. If they manage just a few high volume car contracts in the future (or a military contract), $20M, $50M or even $200M would become just a drop in a bucket.

Deleted member 13524 · Jan 13, 2016

silent_guy said:
Making chips pin compatible is not that difficult, and the payoff of being able to reuse most of an already existing PCB design considerable.

Not on every situation. Definitely not for the specific Maxwell->Pascal transition, as you would probably agree.

silent_guy said:
In terms of location of the pins, you have 2 levels of indirection: the RDL layer on which the silicon is mounted to form the final die, and the substrate on which the die is mounted to form the package. Given that a GP104 with GDDR5X would have the same IO pins as GM204, give or take a few, making those pin compatible wouldn't be very hard.

The substrate is also what would adjust for the differences in die size. The size of substrate is determined for a large part by the ball pitch: larger pitch allows for each per PCBs.

And in terms of voltage supplies, it wouldn't be very complicated either. Even if the actual voltages are difference, it's just be a matter of swapping regulators or even just reprogramming them to a difference voltage. After all, that's already what happens in DVFS control loops.

Okay, same PCB, different chip with different substrate: possible (never said it wasn't, just that it would be a hassle). Doesn't alter the fact that Pascal is supposed to be much more energy efficient, so a GM204's PCB for a GP104 die would most probably be an overkill for power delivery.

Not to mention that if you're using MOSFETs to deliver power to much lower values than their typical rates, you're probably going off from their best efficiency curves.
So not only are you paying more for unneeded power, you're also spending more power than you would need.
All this just for reusing 2 year-old PCB design? Again, it's unthinkable.

Maybe if you were saying use the GM204's PCB for the GP100...

Razor1 · Jan 13, 2016

ToTTenTranz said:
Not on every situation. Definitely not for the specific Maxwell->Pascal transition, as you would probably agree.

Pin compatibility doesn't affect the design choices of the architecture. And it shouldn't be looked at from that point of view. It should be looked at design choice can it be still pin compatible, if so how can we keep it pin compatible.

PS I think we have seen pin compatibility between different generations of GPU's in the past

Okay, same PCB, different chip with different substrate: possible (never said it wasn't, just that it would be a hassle). Doesn't alter the fact that Pascal is supposed to be much more energy efficient, so a GM204's PCB for a GP104 die would most probably be an overkill for power delivery.

Not saying this is pascal on those boards but no, if the silicon is around the same size from node to node, the new architecture can be more power efficient but since its using more transistors, power usage can still stay similar to the older chip with the larger node.

Efficiency is a ratio, so more transistors using similar power is still more efficient. That means power delivery still has to be the similar amounts.

Deleted member 13524 · Jan 13, 2016

Razor1 said:
Pin compatibility doesn't affect the design choices of the architecture.

Not true.

Razor1 said:
Not saying this is pascal on those boards but no, if the silicon is around the same size from node to node, the new architecture can be more power efficient but since its using more transistors, power usage can still stay similar to the older chip with the larger node.

I'm assuming GP104 will not be the same size as GM204 because GM204 became a non-traditionally large chip for its segment (because nVidia had to do 3 families / 2 whole generations using the same process).
GP104 will probably go down to ~300mm^2, like GK104, GF114 and GF104 before it.
The higher price-per-area will likely be compensated through smaller chips with similar or higher performance.

Razor1 · Jan 13, 2016

pin compatibility is always if possible or not, its never something they will change an architecture for. No chip designer will hamper themselves to those types of constraints.

nV we have seen different generations that are pin compatible not to mention on different nodes.

I'm not sure about it being "a large" chip for that tier any more. Well you have to use more transistors for more performance, unless they can increase frequency more, so I would expect these chips to stay around the same size. I don't see with what TSMC, AMD, GF and Samsung said about the process, increasing frequency seems to really negate the power savings advantage. This was that chart with the flatter line for 14nm finfet.

3dilettante · Jan 13, 2016

AMD's chart is more of an illustration. We have seen that FinFET has a less massive improvement at higher frequencies, and at some point can end up being little better or increasingly worse.
Maybe I missed the context, but a curve discussing a process's range may not be helpful unless we know where an architecture's target range lies within it. It's not clear how much of the curve applies when the process can handle sub-GHz silicon and 2+ GHz CPUs. Mobile chips seem to have benefited with lower power and better clocks.

silent_guy · Jan 13, 2016

Benetanegia said:
...

I think this part of the discussion has run its course...

Infinisearch · Jan 13, 2016

I haven't been paying much attention... besides process technology and HBM, are there supposed to be architectural advancements in Pascal? If so any rumors on what they are or the performance advantage over Maxwell?

Razor1 · Jan 13, 2016

outside of more performance per watt, more compute capabilities and performance, and other generalities, that's about it, nothing too specific has been given out yet.

xpea · Jan 14, 2016

from what it seems, INT8 and FP16 (to boost new NV deep learning focus), nvlink (to remain competitive in HPC) and HBM2 are given on Pascal. FP64 performance is still unclear through the range (dedicated fp64 units or universal FP16/32/64 ones). On a personal side, I think Nvidia has some surprise under his sleeve and we don't know everything yet...

LordEC911 · Jan 14, 2016

Benetanegia said:
I didn't know it could be so fast. I've definitely heard that it takes 2-3 months to get silicon back, as a minimum. Shouldn't believe the internet.

After a bit of searching, I'm leaning more towards Nvidia using a different nomenclature tho. GK110 can be found in two revisions A1 and B1. Is there any compelling reason why they could have f up a B0? B1 which I believed appeared with the 780 Ti, came pretty late in the game too.

There were many revisions for GK110. I gathered a lot of info about the GK100/110 timeline due to so many people saying Nvidia sat on it because AMD couldn't compete.

LordEC911 said:
First silicon from fab for GK110 was AA(or A0, depending on what you believe). Respun into AB in Aug '12 and the latest manufacture date I can find is end of Sept '12. The first A1 silicon is Oct '12 (but was an AB chip), most initial retail cards had GK110 chips made in Dec '12.
There were certain AB chips that had screen printed GK110-400-A1 on it.
Some of the GTX Titan review cards were AB chips.

I personally believe Nvidia changed their early silicon designations so they can keep launching with "A1" silicon. As we saw with GK110, that wasn't the first try.

LordEC911 said:
GK100 | Q3 '11 - rough originally scheduled production start for Oakridge which was Cancelled/Scrapped
GK110-AB | Aug '12 - production start of Oakridge contract
GK110-A1 | Dec '12 - production start for Titan/GTX780 and ongoing Tesla supply
GK110-B1 | Jul/Aug '13 - production start for increased 15SMX yield aka K40, K6000 and GTX780Ti

Edit- Late to the convo, but wanted to include the info that I already dug around and found from years back.

Benetanegia · Jan 14, 2016

Good to know, thanks.

That's some weird stuff tho.

Also when you say some AB chips had A1 in it and that sone Titan review cards were AB chips, are you referencing the same cards/chips or did some Titans actually had AB printed on them?

Benetanegia · Jan 14, 2016

To add to the weirdness. Review cards for GK104 and GK107 were A2, from what I saw yesterday when digging a few reviews. At the time I thought it made sense to find that the first chips to be released in the new node were A2, while it was some of the following chips that were A1. Even if it was strange to think they'd get it right in the first try, the fact that the first few chips required more spins than latter ones made some sense, and due to lack of competing products at the time of GK110, and the fact they were all partially disabled chips, could have meant they did in fact go with first silicon even if it wasn't really good.

Now I'm just confused. Why didn't they just keep doing AB, AC, or whatever they'd need, with GK104 and GK107? Maybe the new naming started with GK110? Maybe the AB chips were an anomaly due to the OakRidge contract?

Nvidia Pascal Speculation Thread

Benetanegia

CarstenS

Moderator

Benetanegia

silent_guy

Razor1

silent_guy

Razor1

Benetanegia

Deleted member 13524

Guest

Razor1

Deleted member 13524

Guest

Razor1

3dilettante

silent_guy

Infinisearch

Razor1

xpea

LordEC911

Benetanegia

Benetanegia

Similar threads