Will 512-bit bus + GDDR5 memory have a place in 2009 ?

So people tell me that Nvidia is not slated to used GDDR5 until D12U or something, which I assume is their next-generation DX11 / D3D11 / Shader Model 5 GPU.

If the era of huge monolithic GPUs is truly over, does that mean 512-bit bus will be a thing of the past?

I've read in recent articles it said that a potential GT200 redesign (not the 55nm GT200b) that uses GDDR5 would most likely use a 256-bit bus to cut costs.


Will we see GPUs in 2009 with 200+ GB/sec bandwidth? If so, I'd imagine the quickest way to get there is 512-bit plus GDDR5.


RV770 (4870) uses GDDR5, but only a 256-bit bus.
GTX 280 uses a 512-bit bus, but only GDDR3.


Hmmm, can Nvidia's next new architecture (I don't mean refreshes or redesigns of GT200) be NOT a monolithic GPU and still have 512-bit + GDDR5 ?

Oh and Larrabee is rumored to use a 1024-bit bus, but that's most likely a 2010 product.

/ramble
 
If they need the bandwidth, I'm sure that's an option. GDDR5 at 1250mhz on a 256bit bus will have more bandwidth (10-15%) than the GTX280 has now.
 
GDDR5 does get up to some frightening high speeds though. A 512bit interface might be skippable for another generation after all.
 
512bit bus seems kinda overkill with GDDR5 unless you'd really build a monster of a chip (I'm thinking something with a die size like the G200 but on 40nm here, though I've no idea how you'd be able to cool that...). Maybe a 384bit gddr5 interface would be more likely for even such a monster.
5Gb/s per pin gddr5 (which should be available very shortly if it not already is) is good enough for 160GB/s on a 256bit bus. Samsung some time ago announced prototypes at 6Gb/s per pin (192GB/s @ 256bit). It is "expected" that gddr5 may scale to 7Gb/s per pin (though I wouldn't really rely on this) if that's still not enough.
 
512bit bus seems kinda overkill with GDDR5 unless you'd really build a monster of a chip (I'm thinking something with a die size like the G200 but on 40nm here, though I've no idea how you'd be able to cool that...). Maybe a 384bit gddr5 interface would be more likely for even such a monster.
5Gb/s per pin gddr5 (which should be available very shortly if it not already is) is good enough for 160GB/s on a 256bit bus. Samsung some time ago announced prototypes at 6Gb/s per pin (192GB/s @ 256bit). It is "expected" that gddr5 may scale to 7Gb/s per pin (though I wouldn't really rely on this) if that's still not enough.

I don't think it's possible to build such a monster chip on such a small manufacturing process and not be pad-limited. 40nm basically mandates the use of 256-bit bus (or smaller).
 
I don't think it's possible to build such a monster chip on such a small manufacturing process and not be pad-limited. 40nm basically mandates the use of 256-bit bus (or smaller).

I never really understood the die size proportional to bus width argument. I don't believe the VRam interfaces with the die directly, rather it interfaces with the package that the die sits on. Could you not just make the package a little bigger to provide the necessary pins for 512bit?
 
Dies can be pad-limited.
The pads that interface between the silicon and the package are a physical limit as well.
It doesn't matter how big the package is if no more pads can be stuck on the die.
 
I never really understood the die size proportional to bus width argument. I don't believe the VRam interfaces with the die directly, rather it interfaces with the package that the die sits on. Could you not just make the package a little bigger to provide the necessary pins for 512bit?

I'm just repeating what I've heard more knowledgeable folks say. The general concensus is 40nm is just too small for a 512-bit interface.
 
If scientists find a way to reduce the specific power (W/mm2) somehow for a smaller processes, then maybe. It would hence be possible to put more transistors on a chip without being stopped by the power-barier. But uhm, realistically spoken, i don't think so.
 
If scientists find a way to reduce the specific power (W/mm2) somehow for a smaller processes, then maybe. It would hence be possible to put more transistors on a chip without being stopped by the power-barier. But uhm, realistically spoken, i don't think so.

I believe this is purely an I/O problem, so things like tranny density and specific power are not an issue. Rather, the amount of pins/pads required to supply the necessary amount of power to the device simply cannot fit in such a small area, let alone with enough room leftover for data pins/pads. Pin/pad count increases with chip complexity, yet the chips themselves (and their interfaces to the packages they reside on) are shrinking.
 
I believe this is purely an I/O problem, so things like tranny density and specific power are not an issue. Rather, the amount of pins/pads required to supply the necessary amount of power to the device simply cannot fit in such a small area, let alone with enough room leftover for data pins/pads. Pin/pad count increases with chip complexity, yet the chips themselves (and their interfaces to the packages they reside on) are shrinking.

That just means there needs to be innovations in packaging. And hey maybe eventually that will make multi gpus better as well.
 
I'm just repeating what I've heard more knowledgeable folks say. The general concensus is 40nm is just too small for a 512-bit interface.
In isolation, that statement is too incomplete to make sense: a 512-bit interface in 40nm on a 500mm2 die makes as much sense as it does on a 500mm2 die in 65nm.

It always comes down to the question of whether or not you have enough functional logic to make sure that your die size determined by the that and not purely by the amount of pads. If you don't, then you're throwing away silicon real estate that's doing absolutely nothing.

Now, since a GPU always has ways to usefully increase the amount of functional logic, the answer to the question is reduced to "Are we willing to make a GPU that's large enough so that it won't be pad limited with a 512-bit interface." It's really that simple. That's a decision that's much more driven by business/market considerations than technical ones.
 
silent_guy, please read my followup post.

Does not pin/pad count increase with chip complexity?

I don't think it does, or at least not after a certain point.

If a chip is so simple that it doesn't need too many pins, then yes, increasing complexity so that it requires more will lead to scaling with complexity.

The hard limit is the physical density that is possible with pads times the area of the chip's underside.
Research is ongoing on increasing pad density, and I think there was a blurb on how R600 had some techniques to reduce its need for pins.

RV770 is evidence enough that after a certain point complexity can continue on-chip even if pad-limited.
That was allowed the extra 2 SIMDs.

One complicating factor is that complex, high clocked, or small-process chips tend to require more pins to maintain a clean suppy of current, and this requires more power and ground pins, which eat into the budget of pads that can be dedicated for other purposes.
 
http://news.cnet.com/8301-10784_3-9978746-7.html?tag=nefd.riv

As we will soon see multiple chips will eliminate the need for 512-bit memory interfaces.

So when you put two boards in, you don't get twice the performance but you (only) get one and a half. You put four boards in and you (only) get about 1.7, 1.8. What ATI is saying is that with two chips using (their) proprietary inter-bus, they will get 1.8 (the performance) with two chips. If that's true, you can expect to see four of them giving you something around 2.5."

Getting 2.5 times the performance from four boards would be a masterstroke for ATI.
 
I don't think it does, or at least not after a certain point.

If a chip is so simple that it doesn't need too many pins, then yes, increasing complexity so that it requires more will lead to scaling with complexity.

I think you've just re-phrased my argument in the negative ;)

RV770 is evidence enough that after a certain point complexity can continue on-chip even if pad-limited.
That was allowed the extra 2 SIMDs.

This strengthens my argument. We're already pin/pad limited @ 55nm, fortunately for AMD they were able to "pad" the die with extra SIMDs instead of "dark" silicon.

One complicating factor is that complex, high clocked, or small-process chips tend to require more pins to maintain a clean suppy of current, and this requires more power and ground pins, which eat into the budget of pads that can be dedicated for other purposes.

Yes, I said as much already, but thank you for elaborating.
 
Does not pin/pad count increase with chip complexity?

Increasing chip complexity obviously won't increase the number of functional pins.

So you must be talking about power pads. Those can just be sprinkled around in the center of the die without impacting the layout below it (*). So unless you're sucking so much power that those pads are not sufficient to feed the beast, you should be fine.

(*) I'm pretty sure that's the case: the power pads in the center will take a number of metal layers, but not all of them, so you can still put logic below it. This is not the case for IO pads: there you need IO drivers, which can be pretty meaty in terms of transistor area. Putting those in the middle will also disrupt the floorplan of the chip.
 
One complicating factor is that complex, high clocked, or small-process chips tend to require more pins to maintain a clean suppy of current, and this requires more power and ground pins, which eat into the budget of pads that can be dedicated for other purposes.
I'm not convinced that's an issue. It may be wrt powering the IO pads themselves, but not to power the core logic. Even with flip chip, you're still going to put the IO pads on the side, as seen on the die show of RV770.
 
Increasing chip complexity obviously won't increase the number of functional pins.

So you must be talking about power pads. Those can just be sprinkled around in the center of the die without impacting the layout below it (*). So unless you're sucking so much power that those pads are not sufficient to feed the beast, you should be fine.

(*) I'm pretty sure that's the case: the power pads in the center will take a number of metal layers, but not all of them, so you can still put logic below it. This is not the case for IO pads: there you need IO drivers, which can be pretty meaty in terms of transistor area. Putting those in the middle will also disrupt the floorplan of the chip.

Ok, so it's not as big of a problem as it's been made out to be. Perhaps this is just anti-monolithic-GPU FUD on ATi's behalf ;)
 
Back
Top