Will 512-bit bus + GDDR5 memory have a place in 2009 ?

I tend to think that while quite possibly the short era of the 512-bit bus will soon be gone, that the era of the > 256-bit bus will continue.

For a very simple reason. It's pure business brilliance how NV has baked into their architecture an "anti-freebie unlocking" device by tieing the memory structure to functional units. So I tend to think they'll always be inspired to have the tippy top part be > 256-bit and then lesser yield-inspired parts use lesser bit widths down to 256-bit.
 
For a very simple reason. It's pure business brilliance how NV has baked into their architecture an "anti-freebie unlocking" device by tieing the memory structure to functional units. So I tend to think they'll always be inspired to have the tippy top part be > 256-bit and then lesser yield-inspired parts use lesser bit widths down to 256-bit.
:?: Anti-freebie unlocking is a simple matter of providing fuses for blocks that you want to disable. You really don't need to go to any further architectural troubles to make that work.

Maybe I'm missing your point?
 
I'm not convinced that's an issue. It may be wrt powering the IO pads themselves, but not to power the core logic. Even with flip chip, you're still going to put the IO pads on the side, as seen on the die show of RV770.

For high performance CPUs, the trend has been pretty strong.
The demand for amps goes up, the sensitivity to power fluctuations goes up, and the designers try to scale down the supply voltage.

Cell is a decent example. Itanium is too, and it's on its own PCB with power regulators running off a higher-voltage input, just like GPUs are.
 
The demand for amps goes up, the sensitivity to power fluctuations goes up, and the designers try to scale down the supply voltage.
Sure, it doesn't get any easier. But are we at the end of line already? Not convinced we are. The ball density is increasing also...
 
I didn't mean to imply that bump density wasn't increasing, but the pace isn't as great as those of process transitions.

In lieu of better bump density, maybe someday we'll see them start to use chip stacking, with through silicon via tech they can get a much denser number of vias between the main die and a lower, broader, simple die.
The die could be made with SRAM or something low-power, or just metal layers that route to the pads, which now have the base die's area as the limit.
 
:?: Anti-freebie unlocking is a simple matter of providing fuses for blocks that you want to disable. You really don't need to go to any further architectural troubles to make that work.

Maybe I'm missing your point?

Then why hasn't it been done across the board if it's that easy?
 
Then why hasn't it been done across the board if it's that easy?
But fuses are used across the board: it has been years since anyone has been able to unlock shader pipes of any kind of GPU, not surprisingly since around the time fabs started offering fuse cells for some of their processes. They are mainly used for memory repair, but can be used for anything else and it's very simple to disable a block of logic if it has been designed up front in some scalable way.
 
Hmm, bad choice of words on my part. :) I meant "across the board" as in "all IHVs across their many gpu products" rather than what you took it to mean.

I don't doubt that what you are suggesting is technically possible, but my impression from various conversations over the years is that perhaps it's not cost effective particularly for lower margin products in the first place.
 
Hmm, bad choice of words on my part. :) I meant "across the board" as in "all IHVs across their many gpu products" ...
That's how I understood it! ;)

... it's not cost effective particularly for lower margin products in the first place.
The practice itself of adding fuses to a product is cheap. Unlike the going wisdom that lasers are used to cut connections, blowing a fuse is just a matter of applying an over-voltage at a specific point according to a particular programming sequence. There's really nothing to it and it's used in very high volume low cost non-GPU chips.
 
It always comes down to the question of whether or not you have enough functional logic to make sure that your die size determined by the that and not purely by the amount of pads. If you don't, then you're throwing away silicon real estate that's doing absolutely nothing.

Now, since a GPU always has ways to usefully increase the amount of functional logic, the answer to the question is reduced to "Are we willing to make a GPU that's large enough so that it won't be pad limited with a 512-bit interface." It's really that simple. That's a decision that's much more driven by business/market considerations than technical ones.

I don't think so. For smaller feature size processes, parameter W/mm2 goes up. So that means that if you want to keep the same mm2 to maintain the 512b bus, the power consumption goes up. Doing so, you'll run into the power-barier sooner or later.
 
The price per mm2 is going to go up regardless though ... the only application for the big chips is high end realtime rendering. Offline rendering and GPGPU in general already have to be able to deal with low internode bandwidth, if they can't they won't run well to begin with, needing to scale across a few more chips can be crossed off against having easier to cool and cheaper hardware (or higher margins if you want to look at it from the perspective of the GPU manufacturer).

I think in the end it's going to depend on how well the R700 does ... if it does really well on the enthousiast market I think it will be the end of the big monolithics as the primary chips of a generation. Maybe they will make sense as refresh parts though.
 
I'm wondering how much the reality of the current situation is affecting people's opinion on the right strategy for the future. What if GT200 had debuted with a 1600Mhz shader clock at current power consumption? Are the relatively low clocks a practical necessity at that die size or simply a flaw in a specific design?
 
Well,even taking into account that the die needs to be a certain size to physically accomodate the pads needed to use a 512 bit bus,consider the following..


An RV670 GPU is only 192mm^ in terms of die size, yet it handles a 256 bit memory bus just fine,and the original R600 GPU at the 80nm fab process,was just over the 420mm^ mark in terms of die size,yet it has a 512 bit memory bus anyhow..


So by that token,we know that you don't need a 500mm^+ die to accomodate a 512bit memory bus like the current GT200 GPU at 65nm uses,and even once the 55nm revision is released,it still yeilds a GPU die over the 400mm^ mark anyhow,so given the original R600 example,there's still enough room in there to keep the 512bit memory bus.


The only time a serious change might have to be made is once you reach the point where the 40nm fab process is ready to handle a 1.4 billion transistor chip,wich would drop the overall die size right around the 300mm^ mark,and that's when an interesting possibility comes to mind.....


What about the possibilty of using a 384 bit bus with on a GPU die at roughly 300mm^ in size and combine that with the use of GDDR5,since we know that a sub 200mm^ die can handle a 256 bit bus(RV670),while a Die just over the 400mm^ mark can handle a 512bit bus(original R600),and Nvidia has released cards with a 384 bit memory bus(8800GTX cards).


Overall memory bandwith would be 50% faster compared with a 256 bit bus and both using memory modules clocked at the same speed in both cases,and i'm sure that there's going to be even faster clocked GDDR 5 by next year and with a 384 bit bus,it's pretty much a given that well in excess of 200GB/sec in terms of local memory bandwith will be exceeded quite easily....
 
I'm wondering how much the reality of the current situation is affecting people's opinion on the right strategy for the future. What if GT200 had debuted with a 1600Mhz shader clock at current power consumption? Are the relatively low clocks a practical necessity at that die size or simply a flaw in a specific design?
If that had happened it would have been a very high performant part for a very small section of the market, rather than an adequate performant part for a very small section of the market. They would still have left the door open for ATI to regain market share ... and NVIDIA isn't running a charity, that isn't their aim.

What if they had debuted with a 55nm 128 shader chip with G200 architectural improvements, a 384 bit GDDR3 bus and 2 GHz shader clock when the G200 was supposed to have hit the market?
 
If that had happened it would have been a very high performant part for a very small section of the market, rather than an adequate performant part for a very small section of the market. They would still have left the door open for ATI to regain market share ... and NVIDIA isn't running a charity, that isn't their aim.

Yeah but I'm talking about the viaibility of Nvidia's monolithic GPU strategy vs AMD's X2's. They obviously can't fight the $200-$300 parts with their top-end chip.

What if they had debuted with a 55nm 128 shader chip with G200 architectural improvements, a 384 bit GDDR3 bus and 2 GHz shader clock when the G200 was supposed to have hit the market?

Exactly, the counter isn't necessarily to adopt a multi-GPU strategy. Another option is to attack the meat of the market at the same time as AMD. Nvidia is sorely lacking a $300 part right now and part of it is due to how much they underestimated the performance that AMD could produce in that price range.
 
Nvidia is sorely lacking a $300 part right now and part of it is due to how much they underestimated the performance that AMD could produce in that price range.

Did they? Or did they just get wrong-footed by following the traditional release schedule of high-end first, midrange a few months later when the competition decided to flip that schedule?

Tho I do think they and all fans of the big daddy gpu should have felt the cold winds of something or other when GTX 280 couldn't consistently smoke a GX2 of the previous generation. You can't point at some surprise strategy of AMD for that.
 
I think they did. G94 proved to be a capable match for RV670. I doubt they were expecting RV770 to be challenging their shiny new chip. They obviously (to me) thought that G92 would be enough to hold off AMD's new stuff but the immense increase in perf/mm^2 took them off guard.
 
Did they? Or did they just get wrong-footed by following the traditional release schedule of high-end first, midrange a few months later when the competition decided to flip that schedule?

Tho I do think they and all fans of the big daddy gpu should have felt the cold winds of something or other when GTX 280 couldn't consistently smoke a GX2 of the previous generation. You can't point at some surprise strategy of AMD for that.

At least AMD has been planning for more than 2 years.

I wonder when both ATI and Nvidia can have a shared memory GPU between multiple cores as working among various applications.
 
At least AMD has been planning for more than 2 years.

I'm a bit reminded of Nvidia catching ATI with their pants down with the reintroduction of SLI in late 2004. Which they had planned for two years in advance. Arguably, this is the first time that ATI has regained the strategic iniative in any significant fashion since then. Which, frankly, I think is healthy for the industry from time to time. How much it will help them won't be clear until we see just how much that new interconnect can accomplish.

So, anyway, to try to wend my way back to the topic. I predict the GTX 280 refresh will be the last 512-bit single-gpu memory bus we'll see for some time, maybe ever. But not the last > 256-bit bus.
 
Back
Top