Nvidia Pascal Announcement

I think the GP107 could be an interesting GPU when also considering Vlow numbers and efficiency.
The 1060 surprisingly only uses 60W in a game when it is undervolted and still hits its 3.8 Tflops at 1500MHz.

It is scary to think just how efficient these GPUs could be made if looking at its lowest optimal voltage, and I am amazed there was not more interest from Apple to consider these GPUs.
Here is Tom's Hardware analysis (great work IMO) showing gaming performance between 1060 and 480 in terms of a performance envelope and fps/watts/frequency.
The figure of interest would be the 1080p IMO.
Performance-vs.-Power-Consumption.png



Power-Consumption-vs.-Clock-Rate.png


So it can be seen it only needs 61W to hit the base clocks (albeit a little bit under as base clock is 1506MHz) and the 3.8 TFlops, which is seen clearly in the 1st chart and the fps difference between 61W (130fps) and 110W (152fps).
Makes me wonder if it is also stable with lower clocks-voltage but would need a modified BIOS, and raises how efficient will the GP107 be at its optimal Vlow figure or at least its base clock speed with ideal voltage-power demand.
You would think IHVs would be lining up for the Pascal GPUs where they need a more efficient good performance product, especially lower end laptops (upper end is a given to Nvidia).
Cheers
 
Last edited:
So I take it you are NOT confused with the AMD RX 480 8GB and the RX 480 4GB.
Nope, I get the full core with both. The 4 GB (less) is pretty obviously visible on the box and you can even select memory size as filter in most price search engines.

What I was confused about was, that after a simple BIOS flash, my 4 GByte RX 480 turned into a full blown 8-GB-model, higher memory clocks included. And all for the steal-price of 219 EUR. :D
 
Nope, I get the full core with both. The 4 GB (less) is pretty obviously visible on the box and you can even select memory size as filter in most price search engines.

What I was confused about was, that after a simple BIOS flash, my 4 GByte RX 480 turned into a full blown 8-GB-model, higher memory clocks included. And all for the steal-price of 219 EUR. :D
Ah what is funny is how many have now cancelled their pre-order 480 4GB now that they have been told the next batch will be a true 4GB :)
Cheers
 
The first Pascal Series die shot fresh from the Hot Chips conference:

CqevXGJXYAAVRw7.jpg

A few quick eyeballed estimates: 1 SM (assuming the dark green squares are a whole SM) takes ~7,3 mm² (Polaris CU ~3mm²), 1 HBM2 PHY ~4,4 mm² (1 Fiji PHY ~10 mm²).
Interestingly, the SM they mention here corresponds with the SM on the other GP10x chips: 128 FMA units per SM. However, each GP100 SM is actually exposed as two independent SMs to software (i.e. CUDA). This is because a thread block can only fill 1/2 an SM on GP100. So from a software point of view, GP100 has 60 SMs with 64 FMAs per SM, but looking at the die photo, you can see the SM from a hardware perspective is 2x as large.

This will likely cause some confusion on these forums (does GP100 have 60 or 30 SMs? The correct answer is: it depends).
 
Interestingly, the SM they mention here corresponds with the SM on the other GP10x chips: 128 FMA units per SM. However, each GP100 SM is actually exposed as two independent SMs to software (i.e. CUDA). This is because a thread block can only fill 1/2 an SM on GP100. So from a software point of view, GP100 has 60 SMs with 64 FMAs per SM, but looking at the die photo, you can see the SM from a hardware perspective is 2x as large.

This will likely cause some confusion on these forums (does GP100 have 60 or 30 SMs? The correct answer is: it depends).

It looks like each of those rectangles is actually 2 SMs in a mirror configuration, the same way as they used to do dual core cpus.
 
It looks like each of those rectangles is actually 2 SMs in a mirror configuration, the same way as they used to do dual core cpus.
Was just about to say the same, looks like mirrorpairs, not twice the size SMs
 
OYcznNU.png


The SMs are really small compared to the rest of the functional blocks. Quite a bit of overhead, looking to the rest of the chip. The red-colored blocks are still suspiciously large, I think they do a lot more beside (supposedly) prim setup and thread dispatch, but the count (6) matches the GPC-to-SM ratio anyway. The central column is too dense to distinguish the separate blocks -- I guess the top and the bottom is where the HBM controllers are, closely to the PHYs.
 
Last edited:
It looks like each of those rectangles is actually 2 SMs in a mirror configuration, the same way as they used to do dual core cpus.
All Pascal SMs have 2 partitions of 64 FMAs. The question is how to market the capabilities of the chip. If you look at the HotChips slide, they say GP100 has 30 SMs. But at GTC, they said 60.

All depends on how you count. I'm just pointing out that it could be confusing if you use the definition of SM from this talk, because all the other documentation on GP100 says otherwise. And I do think it will cause confusion on these forums.
 
One of the SMs in the top-left of that die shot appears to have a diagonal line across it. Is this a by-product of the way they are disabled?
 
Looks like a lasercut, so probably yes.
Repeat after me: "Laser cutting went out of fashion decades ago. And even then it was only use to trim hyper-accurate opamps and the like. And even then, it would only destroy a metal wire of a few micron at best."

Fuses are blow not with a laser but by applying a voltage to some special cell with a delicate gate oxide or something. With some FPGAs, it's something you can do at your desk with a simple JTAG programmer.
 
Last edited:
Repeat after me: "Laser cutting went out of fashion decades ago. And even then it was only use to trim hyper-accurate opamps and the like. And even then, it would only destroy a metal wire of a few micron at best."

Fuses are blow not with a laser but by applying a voltage to some special cell with a delicate gate oxide or something. With some FPGAs, it's something you can do at your desk with a simple JTAG programmer.

Then what is that diagonal cut across two SMs?
 
Then what is that diagonal cut across two SMs?
It's an extremely dirty picture of the die. I think they chemically decapped a previously mounted die that was used for failure analysis rather than take a picture of a pre-assembled die. Unlike the German die photographer they didn't spend hours to make it piece of art.

In other words: the die had been subjected to a lot of abuse before they took the picture. Those diagonal cuts could be anything, but they're definitely not something you'd do to disable a functional block. That's something you want to do after the die is mounted on the interposer and mounted on the substrate: all those steps can result in silicon failure. You want to disable stuff as late as practical.

Maybe they didn't want to use a clean picture to prevent detailed competitive analysis.
 
Back
Top