Nvidia Pascal Announcement

fellix · Apr 13, 2016

NVIDIA partners halt GeForce GTX 970, 980, 980Ti production

Equivalent to GTX 980 Ti will use a GP 104-400 GPU

Equivalent to GTX 980 will use a GP 104-200 GPU

Equivalent to GTX 970 will use a GP 104-150 GPU

Yep, a mid-range GPU is displacing a high-end one from the previous generation.

CarstenS · Apr 13, 2016

Not surprising, given almost twice the transistor budget per mm².

Remember GK104 <- GF110? There also was a process change involved.

fellix · Apr 13, 2016

CarstenS said:
Not surprising, given almost twice the transistor budget per mm². Remember GK104 <- GF110? There also was a process change involved.

Indeed, this is what I thought earlier, that Nvidia will continue the tradition from the GTX680 launch. Well, I personally wouldn't mind a 195W GTX 980 Ti replacement.

CarstenS · Apr 13, 2016

Let's hope, AMD can pull the same stunt - and we're in for a pretty interesting round of products.

DuckThor Evil · Apr 13, 2016

fellix said:
Indeed, this is what I thought earlier, that Nvidia will continue the tradition from the GTX680 launch.

It is still a valid point by Adored that now this GP104 with a rumoured around 300mm2 chip (similar to GK104) will have to beat a very gaming optimized 600mm2 GM200 chip. GF110 was 520mm2 and had some FP64 "baggage".
So at least to me it would be surprising if GP104 would be able to beat GM200 by the same margin as Kepler did Fermi. I also hope this victory doesn't come by clocking the chip much closer to its limits compared to Maxwell and then have it struggle against the 3rd party Maxwell models, which have 20% more stable performance than the stock clocked model.

Ailuros · Apr 13, 2016

Just for the record's sake the difference at the GTX680 launch to the former GTX580 was at ~25% in =/>1080p resolutions. Other than that assuming a 25Mio/mm2 transistor density at 300mm2 you get exactly 7.5b transistors. Given that there have been architectural changes for Pascal (relatively minor ones) and they've most likely also increased frequency (which is also an architectural change for the record and not a garden variety overclock), there's little to no indication yet that reaching or even exceeding by a small margin GM200 performance is impossible.

Shall we dig up the database here or in all other fora how MANY called for BS when it was claimed that the GK104 is somewhat faster than the GF110 in the past?

Voxilla · Apr 13, 2016

Ailuros said:
Given that there have been architectural changes for Pascal (relatively minor ones) and they've most likely also increased frequency (which is also an architectural change for the record and not a garden variety overclock), there's little to no indication yet that reaching or even exceeding by a small margin GM200 performance is impossible.

4 GPCs x10 SM ie 40x64 = 2560 FP32 cores, 160 TMUs, 64 ROPS and 1.5Ghz + clock would be enough to outrun GM200.

Ailuros · Apr 13, 2016

Depends where they'll draw the line for maximum power levels. But for the sake of speculative math with 20 clusters clocked at 1.4GHz you're already at almost 7.2 TFLOPs or nearly 20% above a 980Ti. The biggest question mark and most interesting IMHO is still how much more a Pascal FLOP "counts" than a Maxwell FLOP in a relative sense.

CarstenS · Apr 13, 2016

Speaking about clocks: Just wanted to reiterate that AMD said that one of the nicer things about 14/16 nm FinFET was that it had much less variance than 28nm. So higher clocks should be entirely possible, but OC-ability will somehwat be diminished since much potential is already utilized at the factory level.

Voxilla · Apr 13, 2016

Ailuros said:
Depends where they'll draw the line for maximum power levels. But for the sake of speculative math with 20 clusters clocked at 1.4GHz you're already at almost 7.2 TFLOPs or nearly 20% above a 980Ti. The biggest question mark and most interesting IMHO is still how much more a Pascal FLOP "counts" than a Maxwell FLOP in a relative sense.

You must be counting FP16 FLOPS, as P100 has 60 clusters (unless you mean something different as SM).

CarstenS · Apr 13, 2016

I think Ailuros meant 40, not 20 clusters as per context of the things he quoted.

Pixel · Apr 13, 2016

So price per mm2 is about the same, so well get about the same # of transistors, which means no big leap in performance at any pricepoint. Probably a 15-30% jump in performance at any particular pricepoint. Nvidia milking their consumerbase and stretching out their roadmap as much as possible.

fellix · Apr 13, 2016

I've compiled some plausible numbers here:

Code:

                    GM200-400       GP104-400
----------------------------------------------
Turbo Clock (MHz)      1075            1600
FP32 FMA Op's          6144            5120
Total GPR Size (MB)     6.1            10.2
Total LDS Size (MB)     2.2             2.5
L2 Size (MB)              3               2
FP32 TFLOPs             6.6             8.2
FP16 TFLOPs             6.6            16.4
GTexels/s               206             256
MTris/s                6450            6400
GPixels/s               103             102
Total LDS BW (TB/s)     3.3             8.2
L2 BW (TB/s)           1.65            1.63

Pascal Jr. looks to be a compute champ and just fine for graphics.

Ext3h · Apr 13, 2016

pharma said:
Since the graphics memory is on-die HBM2, the VRAM amount is fixed. That means that ALL GP100 products will get 16GB of memory. HBM2 will run a wide 4096-bit HBM2 (1024 bit per IC stack) memory interface running an effective bandwidth anywhere up-to a full 1 TB/s.

Click to expand...

http://www.guru3d.com/news-story/nv...ecap-full-gpu-has-3840-shader-processors.html

I'm pretty sure that part is plain wrong. Both being "on-die" (there is still an interposer), and having a fixed 16GB limit. If I remember it right, in one of the presentations it was even said that there is currently a spacer on top of the 4GB stacks (rather than fitting the heat spreader directly!), so the upcoming 8GB HBM2 stacks will be physically compatible. Which means there is a 32GB variant planed, once the larger HBM2 stacks are shipping.

silent_guy · Apr 13, 2016

Pixel said:
So price per mm2 is about the same, so well get about the same # of transistors, which means no big leap in performance at any pricepoint. Probably a 15-30% jump in performance at any particular pricepoint. Nvidia milking their consumerbase and stretching out their roadmap as much as possible.

Why not wait a bit for actual prices before starting the round of complaints (about a product that you probably aren't going to buy anyway?)

I remember outrageous price predictions being thrown around for GTX970 and we know how that turned out.

If a 1070 is on par or faster with a 980 Ti for, say, $450, then we have a pretty nice price reduction for the same performance.

CarstenS · Apr 13, 2016

Ext3h said:
IIf I remember it right, in one of the presentations it was even said that there is currently a spacer on top of the 4GB stacks (rather than fitting the heat spreader directly!), so the upcoming 8GB HBM2 stacks will be physically compatible. Which means there is a 32GB variant planed, once the larger HBM2 stacks are shipping.

You do. HBM gen2 will physically be larger than HBM gen1, but in itself, the stacks are identically sized, ranging from 3mKGSD to 9mKGSD.

CarstenS · Apr 13, 2016

Since my search for the string „partitioned register file“ yielded no result and the dates for application could be fitting for the Pascal generation:
https://www.google.de/patents/US20150143061

A system includes a processing unit and a register file. The register file includes at least a first memory structure and a second memory structure. The first memory structure has a lower access energy than the second memory structure. The processing unit is configured to address the register file using a single logical namespace for both the first memory structure and the second memory structure.
[…]
What is claimed is: […] a register file coupled to the processing unit, the register file comprising at least a first memory structure and a second memory structure, the first memory structure having a lower access energy than the second memory structure.

Seems in line with larger # of registers per ALU block.

Ailuros · Apr 13, 2016

CarstenS said:
I think Ailuros meant 40, not 20 clusters as per context of the things he quoted.

I still have to get used to the fact that it's now 64SPs/SM and not 128 mind you

Jawed · Apr 13, 2016

CarstenS said:
You do. HBM gen2 will physically be larger than HBM gen1, but in itself, the stacks are identically sized, ranging from 3mKGSD to 9mKGSD.

There is no variant of HBM1 with 9 dies.

nnunn · Apr 13, 2016

CarstenS said:
... I'd fancy the idea of a dedicated GP102 for gaming, even though GP100 could have better thermal characteristics with all those powergated or fused off DPFP-units acting as thermal spacers.

"dark silicon" as part of cooling solution? Cool idea!

Nvidia Pascal Announcement

fellix

CarstenS

Moderator

fellix

CarstenS

Moderator

DuckThor Evil

Ailuros

Epsilon plus three

Voxilla

Ailuros

Epsilon plus three

CarstenS

Moderator

Voxilla

CarstenS

Moderator

Pixel

fellix

Ext3h

silent_guy

CarstenS

Moderator

CarstenS

Moderator

Ailuros

Epsilon plus three

Jawed

nnunn

Similar threads