Nvidia Pascal Speculation Thread

Status
Not open for further replies.
It's the new version of OpenGL which is closer to the metal like DX12. Basically the DX12 version of OpenGL.
Not really. While it's maintained and developed by Khronos, just like OpenGL, it was built on AMD's Mantle rather than being from scratch or based on older OpenGL
 
Speculation from me, no backed yet:

What's the chance that the first Pascal GPUs are going to be shipped as Quadro or Tesla, not GeForce series?

So far, all marketing material published by NV has revolved around the topic "neural networks" and similar scientific applications, but nothing about where to place Pascal performance wise in games.

There is also a good chance that 16nm FF+ turned out to have worse yields and therefor higher costs than initially expected, making it quite possible that Pascal will only be rentable in the middle 4-diggit price range in the beginning.

Unfortunately, this would also mean that Maxwell isn't going to phase out as fast as expected. Not as long the Pascal cards in the same performance range are not reaching the same profits.

And I'm not just talking about HBM2 being reserved to high end models for now, I have concerns that Pascal won't be cost efficient in general for quite some time.
 
nV has launched professional cards first in the past, so I wouldn't be too surprised if they do but I think its going to be based on the dynamic of the marketplace when they are ready to release they do have pressure from Intel next gen Phi and also the gaming market so if they need to get the gaming cards out, they might do them at the same time.
 
So far, all marketing material published by NV has revolved around the topic "neural networks" and similar scientific applications, but nothing about where to place Pascal performance wise in games.
It is the same strategy they used for the Fermi reveal and presentation first in 2009 and for the big Kepler in 2012 -- both architectures with emphasis on HPC, while consumer features were trailing behind. And since Maxwell was clearly consumer-first architecture (with some cloud-induced applications), now it's only logical that Pascal would be propped for the next wave of HPC, that the previous generation took a rest from, mostly thanks to the extended life of the 28nm process.
 
There is also a good chance that 16nm FF+ turned out to have worse yields and therefor higher costs than initially expected, making it quite possible that Pascal will only be rentable in the middle 4-diggit price range in the beginning.
I don't agree. At Pascal launch, I believe that FF16+ will have much better yields than 28nm at beginning. For months, Apple is already making millions of A9 on FF16+ and the process is derived from 20nm that is used for years.
On other side, HBM2...
 
For months, Apple is already making millions of A9 on FF16+ and the process is derived from 20nm that is used for years.
The A9 isn't directly comparable. From what I understood, FF16+ allows to tune for two different characteristics, either transistor density and switching time or power consumption.

Apple obviously went for the latter with the A9, since the A9 is still reasonable small in terms of transistor count while battery live matters.

While Pascal has once again doubled transistor count over Maxwell, meaning they went for full density as I'm not aware that they could push die size much further.
 
Apple is dual sourcing manufacturing for the A9. If yields would be that great they could have avoided the trouble and cost to dual source.

Further than that where the heck does it come from that 16FF+ is a"derivative from 20SOC?. In other news Maxwell isn't just a coffee brand and Vulkan not just a hole in the ground that spits fire...
 
Speculation from me, no backed yet:

What's the chance that the first Pascal GPUs are going to be shipped as Quadro or Tesla, not GeForce series?

...

And I'm not just talking about HBM2 being reserved to high end models for now, I have concerns that Pascal won't be cost efficient in general for quite some time.

The chance is pretty high, as 16FF and HBM2 are new technologies for NV. If GP100 features over 17 billion transistors, the density (transistors per mm²) will be higher than expected. Because HBM2 delivers smaller PHYs (and NVs MI PHYs were always big), the general density could be higher than expected. As well I dont think that GP100 will be as big as GM200 (~600mm²), as a bigger die needs a bigger interposer. Although NV has far more financial foothold than AMD, going with Finfet and HBM2 will be huge task (AMD worked for ~10 years). I speculate the die around 500~550mm² and with the 17billion transistors (leaked), we get a density of around 30~34 million transistors/mm².
I also think that NV will increase the cache sizes and register files throughout the architecture. I wouldn't be surprised if each SM features a huge register file (at least twice as big as GM200) and increased cache sizes (192KB+/48KB+), as well a big L2 Cache (4MB+) located at the memory interfaces. Greater caches could have a positive impact on density.

Another problem is the DP throughput. If NV goes for Mixed Precision (Half : Single : Double => 4:2:1), i would speculate ~6000 SPs at max. IMO i would go for ~5000SPs.
 
Another problem is the DP throughput. If NV goes for Mixed Precision (Half : Single : Double => 4:2:1), i would speculate ~6000 SPs at max. IMO i would go for ~5000SPs.
2:1? No way.

They will most likely go with mixed precision ALUs instead of dedicated DP and SP units this time. And for that, the rate is about 4:1 for doubled data width for multiplication. So 16:4:1, best case. I think half precision is additionally going to be implemented as VLIW4 or at least VLIW2 (to keep the architecture 32bit wide) at SP latency, double precision as multi-cycle.
 
Apple is dual sourcing manufacturing for the A9. If yields would be that great they could have avoided the trouble and cost to dual source.
Apple's volumes are ginormous. If NV were to sell like 80 million GPUs every quarter they would be shitting their pants of sheer surprise and excitement. Alas, that's not the case though.
 
maybe they are prepping the code names for a titan and then a gaming version like they did with the 7x0 series.

Weird GV100 is in there, kind of a bit early for that I think.
 
2:1? No way.

They will most likely go with mixed precision ALUs instead of dedicated DP and SP units this time. And for that, the rate is about 4:1 for doubled data width for multiplication. So 16:4:1, best case. I think half precision is additionally going to be implemented as VLIW4 or at least VLIW2 (to keep the architecture 32bit wide) at SP latency, double precision as multi-cycle.

Well, Nvidia already had FP32:FP64 2:1 with Fermi. AMD had it with Hawaii. NV had 3:1 with GK100/110. NV has FP16:FP32 2:1 with GM20B (Tegra X GPU). The most interesting fact is, that Fermi didn't have dedicated FP64 units.

For instance, if GP110 has 6144 SPs and only FP32:FP64-ratio of 4:1, it achieves only 1536 FP64-FMAs per cylce. AMDs Hawaii is capable to do 1408 FP64-FMAs per cycle. They will go for FP32:FP64-ratio of 2:1, for sure...
 
Well, Nvidia already had FP32:FP64 2:1 with Fermi. AMD had it with Hawaii. NV had 3:1 with GK100/110. NV has FP16:FP32 2:1 with GM20B (Tegra X GPU). The most interesting fact is, that Fermi didn't have dedicated FP64 units.
Out of these, Kepler was actually the one closest to optimum. It's just that a 64bit FMAD/FMUL costs 4x as much hardware resources as the corresponding 32bit operation, you can't cheat around that. There is some additional, mostly data width independent overhead for IEEE 754 specific edge case handling.

If anything, SP to DP ratio is going to get worse than 4:1 in Pascal, not better. They might even be tempted to still handle DP with dedicated units, that would be the only way they could achieve a better ratio.

Going 2:1 on a mixed mode FPU is effectively wasting resources while in SP mode - about 50% actually. Going 8:1 or worse is indicating "software emulation" (not actually, just running FP in multiple rounds through the integer ALU).

And you didn't just called for the 295X2 as a reference for DP performance?

That's a dual GPU card, and if you really want to go that way:
The champion in terms of DP performance is still AMDs old Tahiti X2/New Zealand/Malta series, setting the bar at 2 SP or 1/2 DP per CU and cycle, with the 2 1/2 years old 7990 only gotten beaten recently by Intel's Knights Landing - and not even by much, just ~50%.
 
EDIT: sorry, wrong thread.

We really need a delete button.. :(
 
Last edited by a moderator:
maybe they are prepping the code names for a titan and then a gaming version like they did with the 7x0 series.

Weird GV100 is in there, kind of a bit early for that I think.

Although it may not see the light in PCs, perhaps being reserved for their Supercomputer contracts.
I am reminded somewhat of ATI's R400/R500(Xenos) development lineage. Although it made its way into the Xbox360, more conservative designs (R420, R520) developed in parallel were used in the PC space until its R600 descendant.
 
Status
Not open for further replies.
Back
Top