Nvidia Pascal Announcement

3dilettante · Apr 5, 2016

DP would give a 25x generational improvement in peak throughput, and FP16 would also scale above the transistor increase for the workload Nvidia wants to target that for in particular.

Various complexity adders that either expand the applicability of the GPU or help massage glass jaws that could hinder sustained performance, and then there is a significant IO adder that would have an area increase disproportionate to its transistor count.

Even not on those terms, 66% more performance on a 88% transistor increase may need some comparisons with other transitions to see how good or bad that might be considered. Transistor count hasn't given a 1:1 improvement generally.

Adored · Apr 5, 2016

It looks a lot like a card built for everything except gaming tbh. I'm not even sure we can glean any information from a gaming perspective from it.

At the end he held up a PX2 board showing the GPUs - I thought they looked quite large compared to the specs also, ie specs look like GP107 but the die size looked more like 250mm2+ range. Did anyone catch that?

wccf got a shot of it - http://cdn.wccftech.com/wp-content/uploads/2016/04/GTC-2016-PX-2-Board-Pascal.jpg

That looks like a large mid-range sized GPU to me.

CSI PC · Apr 6, 2016

silent_guy said:
I frankly don't see them releasing a full SM version any time soon, if ever.

The beauty of having tons of identical cores is that disabling a few is almost invisible. See GTX 980 Ti vs Titan X.

The cool thing about redundancy and random faults is that the benefits of redundancy are largely uncorrelated with the size of the redundant block: if you have 30 blocks and you disable 1 of them, your benefit won't be much better than having 60 and disabling 1.

However, your benefits go up significantly with the number of redundant blocks. 1 redundant block out of 30 is much less effective than 2 out of 60, even though it makes no difference in terms of redundant area. My theory is that Nvidia split the SMs in half for this reason. With a smaller granularity, they can exploit this benefit.

Fingers crossed then they have learnt from the mess they created doing that for the 980-to-970

Although the 970 sold so well I am wary how that range of cards are going to pan out this time but I have my fingers crossed neither will be as crippled this time.
Cheers

ieldra · Apr 6, 2016

First I heard of Gp102 in this thread, it would make sense to have a variant with no HBM and possibly die size reduction from having significantly reduced fp64. What are the odds of GP102 with more fp32 units per SM? Pretty miniscule eh ?

fellix · Apr 6, 2016

Frenetic Pony said:
Is HBM2 really that big? I mean, geeze. It's not like the transistors to performance even scaled linearly, hell it dropped. We have a 66% (approximate) performance improvement from an 87.5% increase in transistors. With HBM 2 and finfet Pascal manages to be worse, transistor per transistor, than Maxwell. A 12% drop in efficiency is not what you want out of a node jump and "architecture jump" at all.

Don't forget the additional DP logic that skews the SP perf/watt/die-size ratio, together with the extra space occupied by the quad NVLink interface and controllers. Also with all the DNN, Nvidia will put more emphasis on mixed precision FLOPs.

CSI PC · Apr 6, 2016

Just thought,
I wonder how many big Pascals are being used by the two supercomputers NVIDIA are building with IBM.
That will probably take a fair chunk of these poor yield for now chips; I assume they would be using these as milestones would be tight for next generation of cards *shrug*
Cheers

ieldra · Apr 6, 2016

and Jen mentioned improvements over the maxwell scheduler, but didn't specify what exactly right ? speculation rampant, people claiming hardware queues/distribution are coming back

Ext3h · Apr 6, 2016

ieldra said:
and Jen mentioned improvements over the maxwell scheduler, but didn't specify what exactly right ? speculation rampant, people claiming hardware queues/distribution are coming back

Hardware queues were never absent. Just not being made use of consistently.

ieldra · Apr 6, 2016

Ext3h said:
Hardware queues were never absent. Just not being made use of consistently.

Well yes, what I meant was, I've seen many people claiming hardware scheduling is back in full, no more static scheduling essentially undoing the 'gutting' Fermi's scheduling subsystem underwent in its transformation into Kepler. Is there any solid indication warp scheduling is back in hardware?

CarstenS · Apr 6, 2016

It just doesn't make sense for known-latency-functions.

ieldra · Apr 6, 2016

CarstenS said:
It just doesn't make sense for known-latency-functions.

yeah thats why they removed it in the first place, where the hell are people getting this information from ?

silent_guy · Apr 6, 2016

ieldra said:
Is there any solid indication warp scheduling is back in hardware?

Was it ever not in hardware?

ieldra · Apr 6, 2016

silent_guy said:
Was it ever not in hardware?

scheduling instructions within a warp has been software side since kepler

3dilettante · Apr 6, 2016

What's the full wording for the TPC initialism? That documented somewhere?

Razor1 · Apr 6, 2016

ieldra said:
and Jen mentioned improvements over the maxwell scheduler, but didn't specify what exactly right ? speculation rampant, people claiming hardware queues/distribution are coming back

Bit too early to speculate on that lol from what they have shown. They did talk about preemption, well mentioned it quickly so there are probably some changes for that.

silent_guy · Apr 6, 2016

ieldra said:
scheduling instructions within a warp has been software side since kepler

Ok. For me 'warp scheduling' meant scheduling between different warps, not scheduling within a warp.

ieldra · Apr 6, 2016

Razor1 said:
Bit too early to speculate on that lol from what they have shown. They did talk about preemption, well mentioned it quickly so there are probably some changes for that.

Yeah but i imagine it's going to be an incremental improvement, they mentioned finer grained preemption being in the works a while back, but that it was still far away

ieldra · Apr 6, 2016

silent_guy said:
Ok. For me 'warp scheduling' meant scheduling between different warps, not scheduling within a warp.

apologies, been drinking ! intra-warp scheduling vs inter-warp scheduling

LordEC911 · Apr 6, 2016

Adored said:
It looks a lot like a card built for everything except gaming tbh. I'm not even sure we can glean any information from a gaming perspective from it.

At the end he held up a PX2 board showing the GPUs - I thought they looked quite large compared to the specs also, ie specs look like GP107 but the die size looked more like 250mm2+ range. Did anyone catch that?

wccf got a shot of it - http://cdn.wccftech.com/wp-content/uploads/2016/04/GTC-2016-PX-2-Board-Pascal.jpg

That looks like a large mid-range sized GPU to me.

128b Pascal. Should be a direct competitor to baby Polaris.

The 250w for DrivePX makes no sense.
GTX980M is already ~100w for the same amount of FP32 TFlops as this new 128b dGPU Pascal.
Combining Denver with 4 ARM cores and having ~512 shaders on the new Tegra w/ iPascal shouldn't be 20-25w.
There doesn't seem to be any power saving from 16/14FinFet. You would expect 60-80w for the 128b dGPU Pascal and <20w for the new Tegra w/ iPascal.

mczak · Apr 6, 2016

3dilettante said:
What's the full wording for the TPC initialism? That documented somewhere?

Thread Processing Cluster? I'm just guessing it's essentially the same thing as GT200 had (3 "multiprocessors" per TPC), so quite old-school ;-).

Nvidia Pascal Announcement

3dilettante

Adored

CSI PC

ieldra

fellix

CSI PC

ieldra

Ext3h

ieldra

CarstenS

Moderator

ieldra

silent_guy

ieldra

3dilettante

Razor1

silent_guy

ieldra

ieldra

LordEC911

mczak

Similar threads