Nvidia Pascal Announcement

Ailuros · Apr 12, 2016

Infinisearch said:
Is Pascal supposed to be a refinement of maxwell or a more significant change to the architecture? or is it not known yet?

It sounds more like a refinement; I'd be very surprised though it doesn't add by at least 10% on a per cluster and per clock level compared to Maxwell.

Deleted member 2197 · Apr 12, 2016

Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors

As the block diagram now shows, the GP100 features six graphics processing clusters (GPCs). Just look at the diagram and count along with me - each GPC holds 10 streaming multiprocessors (SMs) and then each SM has 64 CUDA cores and four texture units. Do the math and you'll reach 640 shader processors per GPC and 3840 shader cores with 240 texture units in total.

6 (GPC) x (10x64) = 3840 Shader processor units in total.

Meaning the GP100 used on the Tesla P100 is not fully enabled. Nvidia is known to out GPU that have disabled segments, it helps them selling different SKUs, the Tesla P100 holds a shader count of 3584 and thus has 56 SMs enabled (from the 60).

GP100’s SM incorporates 64 single-precision (FP32) CUDA Cores. In contrast, the Maxwell and Kepler SMs had 128 and 192 FP32 CUDA Cores, respectively. The GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, and two dispatch units. While a GP100 SM has half the total number of CUDA Cores of a Maxwell SM, it maintains the same register file size and supports similar occupancy of warps and thread blocks.GP100’s SM has the same number of registers as Maxwell GM200 and Kepler GK110 SMs, but the entire GP100 GPU has far more SMs, and thus many more registers overall. This means threads across the GPU have access to more registers, and GP100 supports more threads, warps, and thread blocks in flight compared to prior GPU generations.

Since the graphics memory is on-die HBM2, the VRAM amount is fixed. That means that ALL GP100 products will get 16GB of memory. HBM2 will run a wide 4096-bit HBM2 (1024 bit per IC stack) memory interface running an effective bandwidth anywhere up-to a full 1 TB/s.

This is a big chip, very big at 600mm^2 hence it is interesting to see that 16nm can offer a lot in terms of clock frequency, The Tesla P100 is an enterprise part that ends up in servers, however this part already is clocked at 1328 MHz with Boost capabilities towards a frequency of 1480 MHz. Combined the TDP still remains to be under 300W.

http://www.guru3d.com/news-story/nv...ecap-full-gpu-has-3840-shader-processors.html

Adored · Apr 12, 2016

silent_guy said:
Yes. And none of that has anything to do with pushing a process node.
If you'd port Fiji or gm200 to 16nm or 40nm and their predecessors as well, you'd see just the same improvement.

There are no indications whatsoever that they were process related.

What I said was the node was pushed further than ever before, the 600mm2 GPUs being proof of that. Previous node flagships would have stopped at the 780 Ti / 290X level, which would obviously be easier to beat on a new node.

Adored · Apr 12, 2016

Razor1 said:
yes the 680 does go past the 580 due to a major re haul of the architecture and process change 40nm to 28 nm two full node jumps (what we have here for this gen is node and half jump 28nm to 16nm).

Suffice to say, everything else is still up in the air, there isn't enough information out to draw any conclusions, outside of people that have already signed an NDA or people that are working for nV and AMD.

40 to 28 is just one node, the old "half nodes" that used to exist. 28nm to 14nm is one full node (28 to 20) + FinFETs on the same 20nm BEOL rebranded to 14/16nm - basically more of a jump than 40 to 28 was but not two complete full node jumps like what Intel did from 32nm to 14nm.

Ailuros · Apr 12, 2016

I hope you find at least one to agree with all the above :runaway:

homerdog · Apr 12, 2016

Infinisearch said:
Is Pascal supposed to be a refinement of maxwell or a more significant change to the architecture? or is it not known yet?

Based on the Guru3D quote that pharma posted it seems they are continuing the Kepler -> Maxwell trend and going even lower on SPs/SM (64 vs 128) while keeping the register file size per SM the same. That alone should yield a nice efficiency bonus per SP (oh I'm sorry, CUDA Core™ :mrgreen:

).

trinibwoy · Apr 12, 2016

Adored said:
Do you expect Nvidia to beat the Titan X with their fastest GP104 card?

I do (hope so).

Razor1 · Apr 12, 2016

Adored said:
40 to 28 is just one node, the old "half nodes" that used to exist. 28nm to 14nm is one full node (28 to 20) + FinFETs on the same 20nm BEOL rebranded to 14/16nm - basically more of a jump than 40 to 28 was but not two complete full node jumps like what Intel did from 32nm to 14nm.

Nope, 32nm from TSMC, was cancelled if I remember correctly, and they jumped to 28nm, which is another full node drop.

28nm to 20nm is a full node jump, but 20nm just wasn't good enough because of leakage for performance chips. This is why TSMC went 16nm which employs smaller finfet transistors but keeps the layers similar to 20nm

Adored · Apr 12, 2016

https://en.wikipedia.org/wiki/Die_shrink

The 2nd part is basically what I said, TSMC's 16nm is 20nm with FinFET. Samsung's feature size is smaller. Overall it'll be slightly ahead of the 40nm to 28nm transition.

Yes TSMC's 32nm was cancelled, it caused AMD a hell of a problem with Cayman. TSMC used to do all the nodes, half and full, but they stopped doing that at 28nm. After that the numbers themselves just became meaningless.

Jawed · Apr 12, 2016

55nm was the last node that followed the full/half cadence at TSMC.

40nm was smaller than the expected 45nm (a full node down from 65nm).

40nm became a new baseline, making 28nm a full node drop. It's not rocket science: the square root of 0.5*40*40 is 28.3.

CarstenS · Apr 12, 2016

Anarchist4000 said:
Upon closer inspection that sticker appears to say "GTX TITAN #######" and the die is ~50% of the area of a fiji. Also shares roughly the same power requirements as a GTX950. Not sure how reliable that picture is...

It also features back-of-the-PCB portions 'shopped onto the front as well as provisions for GDDR5-Routing, i.e. blatantly fake, as does the subtext in the original say as well.

fellix · Apr 12, 2016

Infinisearch said:
Is Pascal supposed to be a refinement of maxwell or a more significant change to the architecture? or is it not known yet?

Pascal was meant to have less intrusive microarchitecture refinement, while the true goal was to quickly capitalise the new manufacturing process (make the largest possible die and stuff it to the edges) and introduce new features "on the periphery" of the ISA. Volta is where Nvidia is supposed to rework the internals.

silent_guy · Apr 12, 2016

Adored said:
What I said was the node was pushed further than ever before, the 600mm2 GPUs being proof of that. Previous node flagships would have stopped at the 780 Ti / 290X level, which would obviously be easier to beat on a new node.

Ah, so now you (re?)define pushing a process as making 600mm2 chips, like GP100?
So why did you write then that this time it should be more difficult when they've already done it?

Sometimes it seems that Adored is just a neural net trained phrase generator. The phrases sounds like real English, but there little real meaning to be found.

Adored · Apr 12, 2016

silent_guy said:
Ah, so now you (re?)define pushing a process as making 600mm2 chips, like GP100?
So why did you write then that this time it should be more difficult when they've already done it?

Sometimes it seems that Adored is just a neural net trained phrase generator. The phrases sounds like real English, but there little real meaning to be found.

If you can't figure out that it's more difficult to beat Titan X on 16nm (with a midrange GPU) than it would be to beat the 780 Ti, it's your comprehension that's at fault.

Adored: This time around there are obvious differences, for example both companies really pushed the 28nm node to its limit so it should be more difficult.

What is so difficult about that? And FYI you're like a neural network sarcasm generator.

Razor1 · Apr 12, 2016

Adored said:
What I said was the node was pushed further than ever before, the 600mm2 GPUs being proof of that. Previous node flagships would have stopped at the 780 Ti / 290X level, which would obviously be easier to beat on a new node.

Hmm you can't say the 28nm was pushed more when there is a major architectural change from the 780ti to the Titan X. Both chips are quite large. And when the 780ti was launched on 28nm it might be have been hard to produce anything bigger without yields being sufficient for mass production without cutting out parts. We don't know this. Both AMD and NV had to drop DP units to create the 9xx and fury lines. What does that tell us? They couldn't push the process much further without redesigning their GPU's to accommodate the process.

nnunn · Apr 12, 2016

Razor1 said:
What does that tell us?

That both AMD and NV could only fit part of their cancelled 20nm designs into 28nm chips?

Razor1 · Apr 12, 2016

no, 20nm chip wouldn't look anything like Maxwell2 or Fiji, they would have had so much more opportunity with transistor budgets, than dropping DP wouldn't have been their only option.

CSI PC · Apr 12, 2016

homerdog said:
Based on the Guru3D quote that pharma posted it seems they are continuing the Kepler -> Maxwell trend and going even lower on SPs/SM (64 vs 128) while keeping the register file size per SM the same. That alone should yield a nice efficiency bonus per SP (oh I'm sorry, CUDA Core™ ).

Along with the refinement to pre-emption and L2 Cache, occupancy.
Please could someone explain whether the atomic memory may have any benefit with "async compute" pertaining to data in gaming?
Cheers

trinibwoy · Apr 13, 2016

fellix said:
Pascal was meant to have less intrusive microarchitecture refinement, while the true goal was to quickly capitalise the new manufacturing process (make the largest possible die and stuff it to the edges) and introduce new features "on the periphery" of the ISA. Volta is where Nvidia is supposed to rework the internals.

My plan for this generation is to pick up GP106 for the living room and hold on to the desktop 780 Ti for a while longer and clear the Steam backlog. It's going to be hard to resist though. Big Volta probably won't hit retail till mid 2018.

homerdog · Apr 13, 2016

trinibwoy said:
My plan for this generation is to pick up GP106 for the living room and hold on to the desktop 780 Ti for a while longer and clear the Steam backlog. It's going to be hard to resist though. Big Volta probably won't hit retail till mid 2018.

Yeah and I'm betting even GP106 will beat a 780Ti, much less GP104.

Nvidia Pascal Announcement

Ailuros

Epsilon plus three

Deleted member 2197

Guest

Adored

Adored

Ailuros

Epsilon plus three

homerdog

donator of the year

trinibwoy

Meh

Razor1

Adored

Jawed

CarstenS

Moderator

fellix

silent_guy

Adored

Razor1

nnunn

Razor1

CSI PC

trinibwoy

Meh

homerdog

donator of the year

Similar threads