Nvidia Pascal Announcement

The specs look unimpressive for a 610mm2 chip. Perhaps they will get another 600mm2 chip for gaming purposes.

The clockspeed is pretty impressive. 2Ghz air cooled overclocked cards incoming?
 
DP would give a 25x generational improvement in peak throughput, and FP16 would also scale above the transistor increase for the workload Nvidia wants to target that for in particular.

Various complexity adders that either expand the applicability of the GPU or help massage glass jaws that could hinder sustained performance, and then there is a significant IO adder that would have an area increase disproportionate to its transistor count.

Even not on those terms, 66% more performance on a 88% transistor increase may need some comparisons with other transitions to see how good or bad that might be considered. Transistor count hasn't given a 1:1 improvement generally.

Well, that at least explains where the extra silicon went. This chip then hits about perfectly for the change of little in architectural terms while maxing out what TSMC's node can do. EG all as expected and any real further valuation will probably have to be reserved for release date/yield numbers/price/etc.
 
The specs look unimpressive for a 610mm2 chip. Perhaps they will get another 600mm2 chip for gaming purposes.

The clockspeed is pretty impressive. 2Ghz air cooled overclocked cards incoming?

600mm2 GPU at 2GHz? How does that work when its already 300W at 1.4GHz.
 
he is saying two different chips one for gaming that doesn't have all the compute features and one for compute only is what I took from his post.
 
600mm2 GPU at 2GHz? How does that work when its already 300W at 1.4GHz.
One explanation could be that Nvidia is more conservative with HPC and automotive systems.
I don't think we'll see 2GHz air cooled, and it'll be a big stretch with forced cooling. But it's not completely outrageous.
 
I am a little bit disappointed by the raw computation power of this chip, and judging by the spec, I am a little bit worried about the instruction latency of Pascal could be very high.
 
I am a little bit worried about the instruction latency of Pascal could be very high.
Might be true. But in return also fewer cores per SM, so the increased backlog might hide the penalty of a high latency. I'm expecting an overall increase in latency, but a more stable throughput.
 
600mm2 GPU at 2GHz? How does that work when its already 300W at 1.4GHz.

Not this big one, but smaller chips. Some 970s were hitting 1.6Ghz overclocked, so with Pascal Teslas having boost clocks ~1.5Ghz I could see some samples touch 2Ghz on air on consumer cards if they are like Maxwell in the overclocking department..

The 1/2 DP rate seems pretty expensive for nvidia, the 300-400mm2 hawaii-like Vega chip should overcome this card's numbers easily depending on its clockspeed.
 
Not this big one, but smaller chips. Some 970s were hitting 1.6Ghz overclocked, so with Pascal Teslas having boost clocks ~1.5Ghz I could see some samples touch 2Ghz on air on consumer cards if they are like Maxwell in the overclocking department..

The 1/2 DP rate seems pretty expensive for nvidia, the 300-400mm2 hawaii-like Vega chip should overcome this card's numbers easily depending on its clockspeed.

But finfet has a sharper curve, a higher curve point, but sharper, for either foundry process. The big chip gets a higher clock versus 28nm, the smaller chips get a lower clock, that's simply the math of it. And considering it's a new process, with the first "big" GPU off it already have disabled modules to increase yields, and the process is already more expensive (per mm^2) than the previous one, well this is going to be a very expensive and possibly quite limited card.

What consumer, or otherwise, cards there are will have to wait until June. And since there's no mention of price/availability/etc. from the announcement there's limited guesses that can be made.
 
why would clocks vary so much with the die size, they could keep the clocks fairly the same just different core counts? They did that with Maxwell 2.

Also this GPU might never be used as a general consumer GPU, but we haven't seen nV diverge their lines like this before. At least not to the extent of not having a compute card GPU not at the top end of their consumer line, but their focus on compute needs against Intel might have forced them to do just that.

They already stated June for their new deep learning cluster in their press release. All P100 cards are going to OEM's like IBM, HP etc. for Q1 launch of their servers. So I suspect they will have their high end chips for consumers well before then (GP104), June seems likely. HBM 2 is in volume production since Jan of this year and select customers are getting it from Samsung, I suspect that is nV for P100 cards.

I am a little bit disappointed by the raw computation power of this chip, and judging by the spec, I am a little bit worried about the instruction latency of Pascal could be very high.


The raw computation power of this chip is 75% over maxwell 2 in SP, so thats not bad for one generation difference and if we expect the consumer card to be clocked higher, then its going to be more than 75%....... I think its safe to assume its going to perform pretty good with additions of the architectural improvements and scheduling improvements. Nothing tangible at the moment but just inferences from the presentation.
 
Last edited:
The 1/2 DP rate seems pretty expensive for nvidia, the 300-400mm2 hawaii-like Vega chip should overcome this card's numbers easily depending on its clockspeed.
Expensive is very relative when you're really talking about the difference of a few percentage points more or less on what's probably a gross margin of 80%.
 
With high $$/mm maybe it makes sense to target this market 1st. Target the lower margin consumer gaming market afterwards and hope that costs have gone down a bit more and yields up at TSMC.
 
Last edited:
first real pictures of Pascal inside DXG1 (not like the renderer from the keynote):
3-1080.4210393052.jpg

6-1080.1429466315.jpg

source: http://www.computerbase.de/2016-04/...it-8-tesla-p100-fuer-kuenstliche-intelligenz/

at least, now we are sure the silicon exists. But why JHH did not showed it yesterday is a mystery...
 
Those are too be build using Volta's not Pascal's.

http://www.anandtech.com/show/8727/nvidia-ibm-supercomputers
I thought the same, but that timeframe is getting very tight for when the mainframe needs to be implemented (due to end user-software support that is also needed to be implemented in 2nd half 2017), also I would expect IBM wants to see how well the NVLINK scales.
TBH I would not expect Pascal to have the NVLINK as it does unless they intend to use it (which will not happen with Intel and consumer market).
Did they change their original schedule/model release in some way last year?
I cannot remember but thought something changed since that plan tthat goes back as you saw to 2014 in Anandtech report, maybe my memory playing tricks.
Cheers
 
source: http://www.computerbase.de/2016-04/...it-8-tesla-p100-fuer-kuenstliche-intelligenz/

at least, now we are sure the silicon exists. But why JHH did not showed it yesterday is a mystery...
The weird thing is that on some P100 modules the HBM stacks are obviously smaller than on the others and the smaller ones don't reflect the light on:

qGatNWS.jpg

zHLQiMk.jpg


We know that HBM2 is supposed to have larger die area than HBM1, so what's the deal with the disparity here? Is Nvidia using mechanical samples with dummy filler for the missing memory stacks to cover all the eight sockets for this demo unit?
The yields must truly in the drain, if this is the case.
 
I wonder how those 5 TFLOPS of DP will turn out in practice. After all, compared to GM200, P100 has eight times as many ALUs to feed from it's same-sized register files. Compared to GK110 it's only half though.
I just remembered GK210, compared to which the DPFP/Register ratio stays the same with GP100.
 
We know that HBM2 is supposed to have larger die area than HBM1, so what's the deal with the disparity here? Is Nvidia using mechanical samples with dummy filler for the missing memory stacks to cover all the eight sockets for this demo unit?
The yields must truly in the drain, if this is the case.

With some perspective toying on the images and pixel counting, I arrive at roughly (+/- 0.2 mm from a less than optimal image) the specced dimensions for HBM gen2 stacks for the obviously populated modules. So I'd guess that other ones are using dummies at least for the memory. Or maybe they're substitute bring-ups equipped with HBM gen1 that got (ab)used in the lab since november.
 
Back
Top