Nvidia Pascal Speculation Thread

Status
Not open for further replies.

silent_guy

Veteran
Subscriber
Actually the 10x Maxwell is an estimate on his part (he said as much in the video), but believe he is drawing this estimate based on the working Pascal and NvLink parts they current have. Once you factor in HBM I really don't think the 10x is far from what we should expect, but time will tell.
I think the 10x is specifically for inter-GPU transfer. So PCIe against NVLINK? The HBM is the 4x.
 

spworley

Newcomer
Jen-Hsun prefaced this section of his keynote with multiple caveats, saying even at the time he knew people would mischaracterize it.
He had just introduced the deep-learning focused DIGITS workstation with four Titan Xs. Pascal will have about 2X the FP32 flops as Maxwell, and 4X the FP16 flops. For neural net learning, FP16 is OK! Memory bandwidth is boosted by about 6X. So overall speedup is about 5X (4X FLOPS, 6X bandwidth). He again stressed how rough this rounding was. And then the Pascal version of the DIGITS workstation can practically connect not 4 (via PCIE), but 8 GPUs (with NVLink), giving twice the number of GPUs per workstation. So that's 2X. 5X times 2X is 10X. The next generation of the DIGITS workstation will be very roughly 10X faster at deep learning training. And he followed up again by saying how this is rough CEO round-off handwaving math. The slide itself shown earlier in this thread even says in the corner "Very rough estimates!" though it's comically blocked by someone's raised arm in the photo.

IMHO, he still made a mistake with the inflammatory title on the slide. It doesn't matter that he carefully added so many caveats, it was still just overhype. But I did get the feeling he was honestly excited, and to be honest, if you can use FP16 (like neural nets can), Pascal really will be a 4x boost in just a year or two. That's pretty impressive.
 

eastmen

Legend
Subscriber
That's a good question for those green birds around here that were passionately evangelizing how "useless" it is. As already mentioned: the more power sensitive things get IHVs will look for solutions to increase efficiency for every possible use case.

I don't think the problem was 16 bit back with the fx. The problem was that NVidia pushed 32bits as an advantage vs 24bit on amd hardware and then failed to even match the performance with 16bit enabled (behind the users back) in the most popular game of that year.
 

Newguy

Regular
IMHO, he still made a mistake with the inflammatory title on the slide. It doesn't matter that he carefully added so many caveats, it was still just overhype. But I did get the feeling he was honestly excited, and to be honest, if you can use FP16 (like neural nets can), Pascal really will be a 4x boost in just a year or two. That's pretty impressive.

He didn't make a mistake, he knew exactly what he was doing, he knew it would get misinterpreted and "10x faster" would spread.
 

Ailuros

Epsilon plus three
Legend
Subscriber
I don't think the problem was 16 bit back with the fx. The problem was that NVidia pushed 32bits as an advantage vs 24bit on amd hardware and then failed to even match the performance with 16bit enabled (behind the users back) in the most popular game of that year.

You're barking at the wrong tree; I meant something completely different and far more recent, but it's not worth keeping the OT alive.
 

CarstenS

Legend
Subscriber
Now, this is how marketing works:
- Nvidia showing a slide with an order of magnitude of improvement
- Alongside, they offer decent explanation on how they arrive at this number including cautionary statement that it's an estimate
- Devilishly engineered, the slide has a hidden ability to switch off higher brain functions of roughly 98.73% of it's audience [I suspect it has to do with the 10, rather than the ×)
- Press & other people all over the world repeat the headline of the slide just because their brain functions are shut down.
- Nvidia gets the headlines they want
- Some of the 1.27 % whose brain functions were not affected (through extensive counter-marketing training or inherent lack of said functions in the first place) start a debate on how much Nvidia is an evil company, with the devil as it's CEO, who is running a suit company despite wearing a leather jacket (this must be the most devilish suite-in-disguise ever)
- Nvidia goes into denial, stating (formally correct) that they explained convincingly how they arrived at this estimate
- The „10ד sticks
- Nvidia celebrates
- 1.27-x % of the people reiterate this anecdote for decades to come, proving time and again how deceivingly devilish and abhorringly evil Nvidia is.

So, what about the Geforce 3 being 7× fast than Geforce 2? That was a really gross lie, right? ;)
 
Funny thing about the geforce FX, it was good at running Doom 3 but nothing else, able to get 60 fps in that one. But the pixel shading was done at 32bit in the end. It worked if the shaders were very tight - "falling off a cliff" happened very quickly.
 

nnunn

Newcomer
Will Nvidia target Pascal with Samsungs 14nm process?
Thinking about recent discussions between these parties brings to mind Intel simply buying DEC's hardware division rather than argue about Alpha floating point pipes found in Pentium Pro. Not suggesting that Samsung would buy Nvidia, but maybe someone offered someone a good deal on 14nm, and everyone shook hands?
 

Erinyes

Regular
Time to revive this thread. My info says big Pascal has taped out, and is on TSMC 16nm (Unknown if this is FF or FF+, though I suspect it is FF+). Target release date is Q1'16. This is a change from Kepler and Maxwell where the smallest chip (GK107 and GM107 respectively) taped out first. Maybe the experience with 20nm was enough for NV to go back to their usual big die first strategy. Given the huge gains in performance compared to 28nm, and the fact that the 16nm process is both immature and quite expensive, I suspect the die size may be a bit smaller than what we've seen with GK110/GM200.

Still wondering what they're doing on Samsung 14nm though, maybe just test chips for now?
 
Time to revive this thread. My info says big Pascal has taped out, and is on TSMC 16nm (Unknown if this is FF or FF+, though I suspect it is FF+). Target release date is Q1'16. This is a change from Kepler and Maxwell where the smallest chip (GK107 and GM107 respectively) taped out first. Maybe the experience with 20nm was enough for NV to go back to their usual big die first strategy. Given the huge gains in performance compared to 28nm, and the fact that the 16nm process is both immature and quite expensive, I suspect the die size may be a bit smaller than what we've seen with GK110/GM200.

Still wondering what they're doing on Samsung 14nm though, maybe just test chips for now?
I hope you are right. A new PC with a GP100+Skylake in a mini ATX case for Q1'16 sounds really good and quite future proof.

Nothing about R490 also taping out?. I hope AMD is not late this time. If Fury was Artic Islands based they would have already half the way done.
 
Last edited:

McHuj

Veteran
Subscriber
I assume all pascal gpus will be HBM based so maybe that's not mature enough to be economical for anything other than high tier; hence, big Pascal coming first.
 

Alatar

Newcomer
Big die first might have something to do with NV having to drop meaningful FP64 capability from GM200?

Also isn't HBM2 (what pascal should use) still going to be pretty much unavailable in Q1 2016? I can't remember where I read that it was going to be available in mid 2016 but anyway Q1 sounds incredibly early.
 
Big die first might have something to do with NV having to drop meaningful FP64 capability from GM200?

Also isn't HBM2 (what pascal should use) still going to be pretty much unavailable in Q1 2016? I can't remember where I read that it was going to be available in mid 2016 but anyway Q1 sounds incredibly early.
Last I read HBM 2 was ready for later this year. It has sense what you say about getting the DP monster processor back after ditching it in GM200.
Or they could think with AMD efficiency improvements with Artic Islands(hopefully) a 7970 size chip in 14/16nm could be faster than a GP104 size one (if the rumors about Maxwell and Pascal being quite similar are true).
 
Last edited:

spworley

Newcomer
I assume all pascal gpus will be HBM based so maybe that's not mature enough to be economical for anything other than high tier; hence, big Pascal coming first.
Will HBM be economical for low tier parts in 2016 or even 2017? If manufacturing a low-margin, high quantity HBM GP107 isn't profitable because of the HBM expense, maybe GM206 will have a longer life to fill in the gap. Unless the low end Pascal parts stick with DDR5.
 

McHuj

Veteran
Subscriber
Will HBM be economical for low tier parts in 2016 or even 2017? If manufacturing a low-margin, high quantity HBM GP107 isn't profitable because of the HBM expense, maybe GM206 will have a longer life to fill in the gap. Unless the low end Pascal parts stick with DDR5.

No idea, but I imagine long terms HBM will replace all external memory. (Wouldn't surprise me if Intel started stacking memory next to their CPU's)

Let say Pascal does offer 2X performance of the Maxwell, you'll need to likely double the bandwidth even for lower tier parts as well as the capacity. There's going to be a point in time when a single 4GB HBM stack that offers 256 GB/sec, will be cheaper than the 4-8 GDDR5 chips needed for the same performance. It terms of final assembly, performance, and power savings HBM seems to have everything going for it (except the interposer stacking costs which should come down)
 

silent_guy

Veteran
Subscriber
Will HBM be economical for low tier parts in 2016 or even 2017? If manufacturing a low-margin, high quantity HBM GP107 isn't profitable because of the HBM expense, maybe GM206 will have a longer life to fill in the gap. Unless the low end Pascal parts stick with DDR5.
I think the economical part is secondary to the question: does it make sense at all? Do we have conclusive evidence that Titan X performance is significantly hampered by memory BW?

If it's only marginal, then it will probably makes sense to use HBM for the Big One on 16/14nm. But then the Titan X performance will move down to the 104 product, where it will be just as marginal as it is for the Titan X, so probably not worth doing.
And then when 10nm comes along, things will move down one more step, so the 100 and 104 part will be HBM worthy, but the rest still won't.

And price reasons will push against that trend.

It does mean that the smaller SKUs will inevitably trend towards larger GDDR5 busses...
 
Status
Not open for further replies.
Top