Die Stacking for Desktop GPUs?

silent_guy · Mar 7, 2009

Jawed said:
My understanding of GDDR is that it's DDR memory technology with controller and I/O features specific to the latency-tolerant GPU view of the world.

Internally is a GDDR chip a single die? Or is it a memory die and a controller+I/O die?

I finally put two datasheets of a DDR2 and GDDR3 chip next to each other to find out exactly what the differences are. (DDR2 and GDDR3 are basically from the same generation of technology. It makes more sense to compare those than DDR3 with GDDR3.)

In terms of commands, they're pretty much identical: ACTIVE, PRECHARGE, READ, WRITE etc. are identically coded. A DRAM chip doesn't really have anything like a memory controller in the usual sense of the word. All it does is execute very low level commands given by the external memory controller. E.g. it's up the external memory controller to guarantee that all rows are visited within a certain amount of time to refresh the contents (unless, you put the chip in auto-refresh mode, but then you can't do access data.)

GDDR3 has a much higher clock, but the CAS time is proportionally larger. A GDDR3 at 1200MHz has a CAS latency of 15. A comparable DDR2 at 400MHz has a CL of 6. In terms of wall clock time, that comes down to roughly the same latency.

So it's really more about bandwidth than about latency, which is not really unexpected: the storage array itself is pretty much the same. Obviously, the width of the bus is also much larger per chip (8 or 16 vs 32).

There is definitely not something wrt latency hiding in GDDR3. That's a concept that lives at a much higher abstraction that the memory chip itself.

The command controller is ridiculously small compared to the storage array. There's no reason whatsoever to put it on a separate die.

I had expected there to be difference wrt burst length, but that's not the case. It can be set to 4 or 8 on both. (I was using Qimonda sheets.)

All in all, differences are minor.

Jawed · Mar 7, 2009

silent_guy said:
GDDR3 has a much higher clock, but the CAS time is proportionally larger. A GDDR3 at 1200MHz has a CAS latency of 15. A comparable DDR2 at 400MHz has a CL of 6. In terms of wall clock time, that comes down to roughly the same latency.

Did you allow for the core clocks of "DDR2" memory in chips that were contemporaneous, i.e. DDR2-400 versus GDDR3-800?

http://hothardware.com/Articles/NVIDIA-GeForce-FX-5700-Ultra-with-GDDR3-RAM/

Jawed

Blazkowicz · Mar 7, 2009

Freak'n Big Panda said:
if dram (or anything for that matter) was stacked ontop on a GPU wouldn't thermals become a huge problem? It seems to me that that would mean a lot of power per mm^3 to be dissipated.

sure, first thing that comes to my mind is embedded stuff, micro-controllers, SoC etc., if the cost and size reduction is worth it over an external DRAM chip.

but.. you might as well use eDram?
also does it end up like Pentium Pro? where there was one or two dies of L2, and only the full assembly could be tested. Pentium pro was very expensive and the 1MB variant with two L2 dies was ultra expensive. Though here the memory should be much cheaper.

silent_guy · Mar 7, 2009

Jawed said:
Did you allow for the core clocks of "DDR2" memory in chips that were contemporaneous, i.e. DDR2-400 versus GDDR3-800?

CAS latencies in wall clock time have changed very little over the years.
E.g. a 900MHz GDDR3 of Qimonda has a CL of 11. That's identical to a CL of 15 for a 1200MHz GDDR3.

MrBelmontvedere · Mar 9, 2009

MfA said:
You could embed heatpipes directly into the die.

a while back IBM was talking about something like this

http://www.eetimes.com/showArticle.jhtml?articleID=208402316

Three-dimensional water-cooled chip stacks will interleave processor cores and memory chips so that the interconnects run vertically chip to chip through copper vias that are surrounded by silicon oxide. Thin-film soldering (using electroplating) enables the separate dice to be electrically bonded to the layers above and below them, with the insulating layers of silicon oxide separating the flowing water from the copper vias.

The power density dramatically increases for such 3-D chip stacks, since enough heat gets trapped between layers to melt the cores. To solve the problem, IBM etched a liquid aqueduct into the silicon oxide on the back each die. That creates a water-filled separating cavity with 10,000 pillars, each housing a copper via surrounded by silicon oxide. The cooling technique runs water through the aqueduct between each layer in the chip stack, enabling IBM to channel heat away from 3-D multichip stacks of nearly any scale.

The technology "forces water between the layers in the chip stack, picking up the heat right at its source," said Brunschwiler. "We found that to [create] an efficient heat remover, [we] had to use a structure with very little resistance to the fluid flow. . . . we found that round pillars aligned in the flow direction and put under pressure gave the best convective heat transfer."

IBM packages the chip stacks in a sealed pressurized silicon housing with an inlet reservoir on one side and an outlet reservoir on the other. The only way water can get from the inlet side of the silicon box to its outlet side is by going through the silicon oxide layers separating the layers of the 3-D stack. Cool water enters a 3-D chip stack and exits heated. The protected copper vias connect the chips vertically. After being forced through the layers between the chips in a stack, the heated water could be fed to the hot tap of the customer's plumbing, turning a data center's wasted heat into a means for reducing the data center's carbon footprint, according to IBM.

Next, the team plans to optimize the cooling structures for smaller chip dimensions, more interconnects and more-sophisticated heat transfer structures. In particular, the lab is experimenting with ways of adding extra cooling to designated hot spots on cores.

this seems to me something that at best is very far off for a consumer GPU. there is still room left for additional transistor scaling without resorting to something exotic like this. when you hit the red brick wall though it might be time to implement this sort of thing.

MfA · Mar 9, 2009

The water cooling itself is not really that exotic, the extra processing steps added are almost negligible compared to making the through die vias in the first place.

Simon F · Mar 9, 2009

MfA said:
You could embed heatpipes directly into the die.

Watch someone patent a brain

Actually, maybe one could do it. AFAIU that printer technology from silverbrookresearch that I posted a link to, uses silicon to pump the ink. I suppose one could also use the silicon to pump a noncondutive cooling fluid through itself. <shrug>

Ailuros · Mar 9, 2009

Well my rather old joke about a digital teapot doesn't sound too far off either under that reasoning LOL

MfA · Mar 9, 2009

Simon F said:
Actually, maybe one could do it. AFAIU that printer technology from silverbrookresearch that I posted a link to, uses silicon to pump the ink. I suppose one could also use the silicon to pump a noncondutive cooling fluid through itself. <shrug>

The point of using a heatpipe is that nothing needs to be pumped to begin with (it uses capillary forces and evaporation/condensation to move heat and fluid around).

Simon F · Mar 9, 2009

MfA said:
The point of using a heatpipe is that nothing needs to be pumped to begin with (it uses capillary forces and evaporation/condensation to move heat and fluid around).

I know, but this seemed like an even better approach.

MfA · Mar 9, 2009

Depends on how small you can make the heat pipes, and looking around I see papers on creating them in silicon, so I'm guessing pretty damn small. Between an integrated pump which is an extra point of failure and the fact that you need a bigger heat exchanger than with heatpipes I don't think you come out ahead. The IBM approach with an external pump and heat exchanger has the advantage of minimum invasiveness for solder bump connected stacks, which already have much of the necessary room for the fluid ... so that makes sense too. But an integrated closed loop liquid cooled system ... not so much IMO.

Panajev2001a · Mar 10, 2009

What about techniques such as the one mentioned in this article?

http://arstechnica.com/hardware/new...ooler-and-puts-it-inside-the-chip-package.ars

3dilettante · Mar 10, 2009

Whatever benefits that might have don't appear very helpful in a stacked die situation.

Bump space is more precious because the chip's base area must be partitioned between all stack layers, and the heat's destination is further away from the primary heat producer.
If only pins at the base have TEC sections, it is useless because we'd only be cooling a DRAM layer.
Cooling the GPU would mean creating a series stack of them.

Having a series of TEC pillars on each layer is putting in a lot of effort to move a small amount of heat, at the cost of expending several times the amount of heat moved in powering the TEC pillars.
A tiny patch of silicon on the cool side of each TEC might be cooled a little, but then the hot side is immediately insulated by another chip layer, and that side will be significantly hotter or attached to a more heavily powered TEC to move that heat down to the next layer.

The process steps would probably be more complex to insert the extra peltier elements, and then the design would have to be characterized again for mechanical and thermal behavior. I suspect there could be problems with the metal/TEC/chip junction on those tiny pillars as things heat and cool in unexpected ways.

Panajev2001a · Mar 10, 2009

3dilettante said:
Whatever benefits that might have don't appear very helpful in a stacked die situation.

Bump space is more precious because the chip's base area must be partitioned between all stack layers, and the heat's destination is further away from the primary heat producer.
If only pins at the base have TEC sections, it is useless because we'd only be cooling a DRAM layer.
Cooling the GPU would mean creating a series stack of them.

Having a series of TEC pillars on each layer is putting in a lot of effort to move a small amount of heat, at the cost of expending several times the amount of heat moved in powering the TEC pillars.
A tiny patch of silicon on the cool side of each TEC might be cooled a little, but then the hot side is immediately insulated by another chip layer, and that side will be significantly hotter or attached to a more heavily powered TEC to move that heat down to the next layer.

The process steps would probably be more complex to insert the extra peltier elements, and then the design would have to be characterized again for mechanical and thermal behavior. I suspect there could be problems with the metal/TEC/chip junction on those tiny pillars as things heat and cool in unexpected ways.

Thanks for the answer

.

iwod · Mar 12, 2009

Intel have something similar as well, they said they intent to bring it into mass production in 2008. Then they postpone it to 2010. And now it looks like another 2 - 5 years before we actually sees it.

Die Stacking for Desktop GPUs?

silent_guy

Jawed

Blazkowicz

silent_guy

MrBelmontvedere

MfA

Simon F

Tea maker

Ailuros

Epsilon plus three

MfA

Simon F

Tea maker

MfA

Panajev2001a

3dilettante

Panajev2001a

iwod

Similar threads