Nvidia Pascal Speculation Thread

Status
Not open for further replies.
I assume all pascal gpus will be HBM based so maybe that's not mature enough to be economical for anything other than high tier; hence, big Pascal coming first.

I'm not sure it makes sense in the low end parts..at least not yet. As silent_guy says..seems more likely that we'll see it on the top tier parts only for now.

Parker is also on TSMC 16nm.
Erinyes, does the big Pascal you mentioned actually have fast DP? (After GM200 I would just like to be sure.)

Still not sure on that one..will need to do some more digging.
 
I think the economical part is secondary to the question: does it make sense at all? Do we have conclusive evidence that Titan X performance is significantly hampered by memory BW?

If it's only marginal, then it will probably makes sense to use HBM for the Big One on 16/14nm. But then the Titan X performance will move down to the 104 product, where it will be just as marginal as it is for the Titan X, so probably not worth doing.
And then when 10nm comes along, things will move down one more step, so the 100 and 104 part will be HBM worthy, but the rest still won't.

And price reasons will push against that trend.

It does mean that the smaller SKUs will inevitably trend towards larger GDDR5 busses...

Even if the bandwidth isn't needed, the power (and size) savings are always welcome. So it does boil down to a cost-power trade-off.
 
Even if the bandwidth isn't needed, the power (and size) savings are always welcome. So it does boil down to a cost-power trade-off.
Welcome, yes. Worth it? Probably not. The cost is going to be just too high.

An additional point: if you use only a single HBM stack on a 107 class decide, that's still the equivalent of a 256 bit GDDR interface. That alone will require more MC related area that is necessary on what is, without a doubt, a very cost sensitive part.
 
I'm not sure it makes sense in the low end parts..at least not yet. As silent_guy says..seems more likely that we'll see it on the top tier parts only for now.

Right I agree with that. It makes me wonder; would they design two Pascals HBM and Non-HBM or will Pascal be only for higher tier and 28nm Maxwell will remain for the lower tier parts. I think the latter is probably more likely.
 
Welcome, yes. Worth it? Probably not. The cost is going to be just too high.

An additional point: if you use only a single HBM stack on a 107 class decide, that's still the equivalent of a 256 bit GDDR interface. That alone will require more MC related area that is necessary on what is, without a doubt, a very cost sensitive part.

GP107 with HBM, that's an interesting prospect (if they can get at least 2GB, 1024bit on it)
Let's pretend it's a laptop GPU then it's a premium part, in places where you need highest performance per watt to get highest performance.
 
GP107 with HBM, that's an interesting prospect (if they can get at least 2GB, 1024bit on it)
Let's pretend it's a laptop GPU then it's a premium part, in places where you need highest performance per watt to get highest performance.
Why? The BW would just sit there waiting for SM have no way of using it.
 
I think the economical part is secondary to the question: does it make sense at all? Do we have conclusive evidence that Titan X performance is significantly hampered by memory BW?

If it's only marginal, then it will probably makes sense to use HBM for the Big One on 16/14nm. But then the Titan X performance will move down to the 104 product, where it will be just as marginal as it is for the Titan X, so probably not worth doing.
And then when 10nm comes along, things will move down one more step, so the 100 and 104 part will be HBM worthy, but the rest still won't.

And price reasons will push against that trend.

It does mean that the smaller SKUs will inevitably trend towards larger GDDR5 busses...

This makes little sense to me. It's not like the chip makers can't adjust bandwidth by varying the number of HBM stacks on the lower end models, and using HBM at bandwidth equal to a large GDDR5 bus allows for either higher perf or lower power, or something in between. The only downsides I can see are economic (HBM/interposer costs/supply constraints, design costs, etc...), which you claim are secondary. What am I missing?

There would seem to be a couple factors that might swing the other way (economically), but I'm not sure if/how much they are significant. From what little I understand, area required for high speed GDDR5 PHY is non-trivial. Is it possible that an HBM-only GPU could have a die size advantage vs a GDDR5 version? Also, the PCBs seem like they become smaller and simpler, except for maybe the cooling solution.
 
With a low ball 512GB/s for 4096 bit HBM (conservative or delibaretly low clock speed) I'm getting a figure of 128GB/s, just using one stack of HBM near the GPU instead of four.
That is 114% the bandwith of GTX 960, or 148% that of GTX 750 Ti so it seems about right.
Or call the GPU a GP106 for the sake of my argument :)

Sure, maybe it won't get done or the gen after that.
 
Why? The BW would just sit there waiting for SM have no way of using it.

It matters how much HBM can be supplied and at what price. HBM is lower power and smaller area for larger bandwidth, so the only reason to not use it would be if it's too expensive versus GDDR5 and/or supply constrained. Which it well could be of course, especially after HBM2 is first introduced and starts ramping up.
 
It matters how much HBM can be supplied and at what price. HBM is lower power and smaller area for larger bandwidth, so the only reason to not use it would be if it's too expensive versus GDDR5 and/or supply constrained. Which it well could be of course, especially after HBM2 is first introduced and starts ramping up.
Right now, we see quite a bit of low end GPUs with DDR3. The only justifiable reason for that can be cost. Why would DDR3 be much cheaper than GDDR5? The core technology is the same. Because it's produced in huge volumes. And there's a lot of competition.

Let's now turn to HBM: not only is the volume essentially zero, it's more complicated in any possible way you can imagine. TSVs, thinned wafers, 5 dies stacked, interposer. And there is no second source either.

Is there anybody here who doesn't think that the cost difference between HBM and GDDR5 is going to make the difference between GDDR5 and DDR3 trivial?

Over time, the cost will go down, of course. But how many years before the volume will start to approach the GDDR5 volume? And the complexity difference will never go away.
 
Right now, we see quite a bit of low end GPUs with DDR3. The only justifiable reason for that can be cost. Why would DDR3 be much cheaper than GDDR5? The core technology is the same. Because it's produced in huge volumes. And there's a lot of competition.

Let's now turn to HBM: not only is the volume essentially zero, it's more complicated in any possible way you can imagine. TSVs, thinned wafers, 5 dies stacked, interposer. And there is no second source either.

Is there anybody here who doesn't think that the cost difference between HBM and GDDR5 is going to make the difference between GDDR5 and DDR3 trivial?

Over time, the cost will go down, of course. But how many years before the volume will start to approach the GDDR5 volume? And the complexity difference will never go away.

In fact, for the bottom end of the stack, I wonder what makes the most sense: a 128-bit bus with GDDR5-5000~6000 or a 256-bit bus with DDR4-2500~3000.
 
Why would DDR3 be much cheaper than GDDR5? The core technology is the same. Because it's produced in huge volumes.

There's more to the cost difference than that. Firstly, GDDR5 modules are mostly signaling -- the actual memory arrays are less than a third of the die. DDR3/4 can use somewhat more of their silicon for the DRAM. Secondly, GDDR5 has more signal lines and they have much lower tolerances than DDR3 lines, making the PCB more expensive.
 
Is not DDR3 has more strict wiring requirements though?

Thats why PS4 MB is so clean and Xbox One needs to have same length wires.
 
Is not DDR3 has more strict wiring requirements though?
I don't know about the wiring requirements of GDDR5 (though I doubt it's easier), and I'm only look-over-the-shoulder familiar with DDR3. But it's really not a big deal for a professional, especially on a dirt cheap 6 or 8 layer PCB.
 
Secondly, GDDR5 has more signal lines and they have much lower tolerances than DDR3 lines, making the PCB more expensive.
It's probably true that the cost of the PCB is higher for GDDR5, not because of tolerances but because of better impedance control. But 2 times nothing is still nothing: high volume low density PCBs (like the ones used in GPUs) don't cost a thing.
 
Is not DDR3 has more strict wiring requirements though?

Thats why PS4 MB is so clean and Xbox One needs to have same length wires.

GDDR5 supports robust signal training at initialization, so the wire length matching can be more relaxed.
 
Last edited:
In fact, for the bottom end of the stack, I wonder what makes the most sense: a 128-bit bus with GDDR5-5000~6000 or a 256-bit bus with DDR4-2500~3000.
At the very bottom: probably none of the right now. In the future: I would guess GDDR5 because a 256 wide bus would take to much die area?
 
At the very bottom: probably none of the right now. In the future: I would guess GDDR5 because a 256 wide bus would take to much die area?

I could be wrong, but I think all recent GPUs support GDDR5 on a 128-bit bus, even if some SKUs only use 64 bits, or use the full bus with DDR3.

As for die area, I'd imagine that for a given width (in bits) a DDR4 PHY is smaller (in mm²) but I have no idea by how much. That said, DDR4 should save a few watts, which can make a difference in a laptop, where most of these chips end up. Plus you can easily make SKUs with ridiculous amounts of RAM (for marketing purposes). Ultimately it might not be worth the extra silicon area.
 
Wider bus versus faster ram has been a debate around for a while, obviously for HBM going much wider is a win. But for traditional memory controllers? I don't know, Nvidia obviously went for a smaller buses and 7ghz ram across the board, while AMD has gone for wider buses and slower ram. Maybe someone interested enough could try and work out which was better for whom, though since AMD and Nvidia have widely differing bus designs and efficiencies it wouldn't be a direct comparison anyway.

Regardless the idea of memory, one way or another, is obviously undergoing a large change. Sure there's HBM, which has a lot of benefits even with the interposer requirements. But there's also a lot of research into ultra fast non volatile memory, with more and more promises of this kind of thing coming out "soon", as in the within the next 3-5 years. It'll be interesting to see what that does, and if traditional DDR ram has its days numbered.

But for Pascal itself, while HBM might ideally make sense with its huge bandwidth and low power requirements, yes there's still supply constrain. SK Hynix is still the only supplier I know of within the next year+. But it is also a JEDEC standard, so I'd be surprised if someone else wasn't working on it.
 
Status
Not open for further replies.
Back
Top