How can IBM's eDRAM help GPUs and CGPUs?

Techno+

Regular
http://www.xbitlabs.com/news/cpu/display/20070214234716.html

Although IBM doesn't market GPUs, AMD (ATI) have extensive partnerships with them. I don't even doubt that Intel and nvidia would also license the technology. GPUs require\have large amounts of bandwidth to their memory, unlike CPUs, so in other words, this will be a great help to CGPUs such as Fusion, the memory bandwidth problem can now be solved using this tech hopefully. what are your thoughts?
 
It's too early to say if IBM's EDRAM technology will find much use outside of massive and long-latency L3 caches.

I know jack about the analog design, but I've seen discussion that would place some barriers in the way of using as a bandwidth solution for GPUs.

The first issue is that even with a density advantage, there will still be a miniscule amount of on-die storage compared to memory.

The second, is that EDRAM apparently has a large amount of variance in manufacturing, with individual DRAM capacitors and the controlling transistors differing significantly from one another even in the same device.

Third, the latency could be a fair bit higher than what is on paper, in order to correct for the variability of rows and arbitrate refreshes.

The big advantage to EDRAM's density is that for large caches, the physical size (or physical separation) of the cache becomes a major factor in the access latency. Being smaller in die space allows EDRAM's latency to more closely match SRAM's at higher cache sizes.

This wouldn't work too well for L2, L1, and the corresponding GPU caches.
The density advantage is also less than ideal; the sense amps used to read DRAM take up some of the space savings and would be replicated for each small GPU local or texture cache.
 
It's too early to say if IBM's EDRAM technology will find much use outside of massive and long-latency L3 caches.
The bigger problem being that its IBM's eDRAM technology, and most fabless semiconductor companies don't build at IBM, but at TSMC or SMIC, or Chartered or any of the other myriad fabs. (in other words, its not portable)

The first issue is that even with a density advantage, there will still be a miniscule amount of on-die storage compared to memory.
Cache is good. 3x as much cache is better. Same cache at cheaper is better. Just because you can't get all of your memory needs in eDRAM doesn't mean you can't benefit from the useage of it.

The density advantage is also less than ideal; the sense amps used to read DRAM take up some of the space savings and would be replicated for each small GPU local or texture cache.
have units share the same physical bank to amortize the cost and the problem goes away.
 
Third, the latency could be a fair bit higher than what is on paper, in order to correct for the variability of rows and arbitrate refreshes.
Depends on the eDRAM. Some of them simply refresh every other clock cycle, so the throughput and latency is simply 1/2 or *2 resp. of what you'd initially expect.

The density advantage is also less than ideal; the sense amps used to read DRAM take up some of the space savings and would be replicated for each small GPU local or texture cache.

SRAMs have sense amplifiers too :!:
 
Cache is good. 3x as much cache is better. Same cache at cheaper is better. Just because you can't get all of your memory needs in eDRAM doesn't mean you can't benefit from the useage of it.

3x as much cache is good for given workloads. It's really great for the massive L3 caches IBM uses for its POWER MCMs for heavy server loads.

The L1 and L2 caches a consumer GPU uses won't be well-served, since signal travel time is not as dominant a factor as it is for a large or external L3.

have units share the same physical bank to amortize the cost and the problem goes away.
Then the latency goes up, which once again makes it less useful as a low-latency high-speed cache.
The GPU caches are already multi-bank arrays. Removing that makes things less attractive.

Depends on the eDRAM. Some of them simply refresh every other clock cycle, so the throughput and latency is simply 1/2 or *2 resp. of what you'd initially expect.
In cycles, but the cycle time stated by IBM is something like 1.5ns to 2ns (2ns is forrandom access). Considering SRAMs in a cache are accessed in less than a single clock cycle at clock speeds which require cycles of less than a sixth of that, the faster levels of cache are not well-served if the chip clocks in the GHz range.

SRAMs have sense amplifiers too :!:
They do, but they have a fair bit more pull than a discharging capacitor. The space savings over 6T SRAM is less than the ideal scaling factor bandied about because there's more setup work.
 
Last edited by a moderator:
In cycles, but the cycle time stated by IBM is something like 1.5ns to 2ns (2ns is forrandom access). Considering SRAMs in a cache are accessed in less than a single clock cycle at clock speeds which require cycles of less than a sixth of that, the faster levels of cache are not well-served if the chip clocks in the GHz range.
I don't think it's that much of a problem. 1.5ns is still 660 MHz, close to main clock. The other thing to consider is that it simply removes pressure from the external memory if you have multiple of them.

They do, but they have a fair bit more pull than a discharging capacitor. The space savings over 6T SRAM is less than the ideal scaling factor bandied about because there's more setup work.
There's always a threshold somewhere. So, yes, it's less than ideal, but for serious stuff like L2 caches, it should be a nobrainer... wrt area.

That said, I don't think eDRAM will be used any time soon in GPU's because excessive power consumption.
 
I don't think it's that much of a problem. 1.5ns is still 660 MHz, close to main clock. The other thing to consider is that it simply removes pressure from the external memory if you have multiple of them.
For G80, it's potentially less than half the main clock, depending on where you put the cache relative to the clock domain.
For R600, it would also be insufficient. Unless future GPUs are going to be downclocked, it's not going to get better.

There's always a threshold somewhere. So, yes, it's less than ideal, but for serious stuff like L2 caches, it should be a nobrainer... wrt area.

I don't think there are any individual L2 caches of significant size on a GPU, just a lot of small local L2 caches. The area savings for that is less, and the latency would still be higher.

If it were a big L2 or a huge L3, then there would be a stronger argument. I don't think GPUs benefit too much from cache of that nature.

That said, I don't think eDRAM will be used any time soon in GPU's because excessive power consumption.
That is a good point.
 
There's probably no reason why this couldn't be done, though I'd wonder just how much AMD cares about performance in the market Fusion is targeting, especially given the power constraints.
As long as the EDRAM is used as an L3 cache, it would be a mixed bag performance-wise unless it were absolutely enormous. The streaming nature of the data use would make the cache less effective.

If it were kept as an explicitely separate storage like how Xenos uses the storage on the daughter die, more could be done with less, but it still wouldn't match a discrete solution.
AMD would have to convince people to use it as well, since this would be more complicated than going to main memory.

I'd be interested to see at what load the factor of 10 drop in bandwidth can't be compensated for by the EDRAM.
Massive bandwidth is no good if what you want to use can't reliably fit in the smaller space.
 
I think from here jPaana would be the guy with first hand knowledge about designing rendering engine using eDRAM, but at the same time I doubt that he would come to this thread and start talking about it.

The team that he worked on in such project is now, after all, part of AMD. Besides, stuff goes on and I don't think early 2000 Infineon's 0.17µ eDRAM fab has much to do with what eDRAM fabs do now. Some of the information and experiences while working with two different chips can still be valuable information.
 
Back
Top