IBM claims new eDRAM will double processor performance

There is an article about eDRAM at Real World Tech, which mentions IBM talking about these issues.

John Barth - IBM Systems & Technology Group
For embedded purposes, SRAM is the de facto choice for discerning designers. Embedded SRAM provides the fastest cycle times while operating well in a semiconductor logic process. However, one bit of SRAM storage typically requires 6 transistors, whereas a DRAM cell only needs 1 transistor plus one capacitor. Hence, the common argument in favour of eDRAM is that of the 4x density advantage relative to eSRAM.

While not ignoring this point, the presenter saw the problem from another perspective. While it was conceded that eSRAM provides the fastest random access cycle times, eDRAM can come close, and the remaining performance differential between eSRAM and eDRAM can be mitigated through architectural choices if they are considered early enough in the design cycle.

The speaker went on to argue that most high-end designs are more oriented about the memory hierarchy than the logic circuits themselves. Further, he posed an example of where eDRAM may be superior to eSRAM in a conventional logic design. The floorplan for the Itanium2 9M processor was displayed, as can be seen in Figure 2. The furthest L3 subarray was estimated to be 23mm away from the cache controller in Intel's layout. The floorplan for a hypothetical Itanium2 9M which used eDRAM for the L3 cache array was then shown (Figure 3). In this floorplan, the furthest subarray would only be, roughly, 14mm away from the cache controller. Delay approximations were made for the hypothesized array, and the results can be found in table 1 below.

Thus, while the actual eDRAM cells are slower than the corresponding eSRAM cells, the increased density of eDRAM leads to shorter wires in the L3 cache array. The reduction in worst-case wire length (23 to 14mm) corresponded to a 39% reduction in wire delay. It should be noted that the speaker emphasized that they took certain liberties when deriving these figures.

During the question session, it was asked what additional costs were involved with fabricating chips using eDRAM. It was stated that the eDRAM process adds 3 extra mask stages before any of the other logic process steps, and that the typical cost adder is on the order of 20%. Thus, there is a cross-over point between the additional cost of eDRAM processing and the increased density of eDRAM. Presently, this cross-over tends to exist around the 8-16Mb mark.

http://www.realworldtech.com/page.cfm?ArticleID=RWT020705121631&p=2
 
It doesn't look like an external 1T-SRAM chip to me. It looks exactly like what you see in Xenos. If the die is close to 132mm^2 then it's eDRAM.

Look it has a size of 94.5mm2 and it's 24MB(!) of 1T-SRAM(!). It's not eDRAM. There's no ROP in it, etc. Unless there's some evidence on your side except that they're a under 1 heatspreader, that's wishful thinking. Everyone else that has actually some knowledge will tell you, it's not eDRAM.
 
I already edit my original post.;)

Actually you could fit close to 20MB of eDRAM into 95mm^2 so I wasn't far off. In other words my point still stands->at 45nm you could fit close to 100MB of eDRAM. I also said it looks like Xenos, I didn't say it functions like Xenos therefore ROPs are irrelevent.
 
Last edited by a moderator:
1T-SRAM is still actually eDRAM... it's just a matter of what it's *embedded* into. So-called 1T-SRAM is just eDRAM with a small normal SRAM buffer that holds an open page so that refresh latencies can be hidden. That the chip is external to the GPU is simply to say that it's not embedded on the same die with any component of the GPU.

It's not eDRAM. There's no ROP in it, etc. Unless there's some evidence on your side except that they're a under 1 heatspreader, that's wishful thinking.
I fail to see what ROPs and heatspreaders have to do with it.

In any case, to me, the more interesting thing is how it will be useful for CPUs. Particularly something like CELL where all the SPEs have totally localized contexts and having a shared local memory block that they can all access quickly with low latency and high bandwidth makes moving data around between SPE threads so much more nice. It's also nice if you want multiple SPEs to work on chunks of a larger data block. Well, I can see all sorts of possibilities, though I'm not about to tell you it will be without any headaches.
 
It's not like eDRAM is a new thing, so how is this different than previous eDRAM? Just faster? They claim it's even faster than SRAM, so does that mean it could be used as an L1 cache? L2, L3? Also, how does it compare to 1T-SRAM in speed and density?
 
For example the Hollywood GPU at 90nm already has 24MB of eDRAM in a separate die and if needed you could just add another 24MB eDRAM die onto the same SiP.


Hollywood only has 3.12 MB of embedded 1T-SRAM, like Flipper, or -perhaps- somewhat more. it's not 100% known. but not 24 MB.

confusingly, the 24 MB 1T-SRAM that the entire Hollywood LSI 'package' has, is actually external memory like it was with GameCube.
It is not actually high-bandwidth embedded memory / eDRAM / embedded 1T-SRAM.
 
Last edited by a moderator:
confusingly, the 24 MB 1T-SRAM that the entire Hollywood LSI 'package' has, is actually external memory like it was with GameCube.
It is not actually high-bandwidth embedded memory / eDRAM / embedded 1T-SRAM.

Actually we don't know this for a fact, but like I mentioned earlier it's likely not eDRAM because 24MB of eDRAM at 90nm would be about 132mm^2. Since the die is 95mm^2 it's probably not eDRAM unless it's only 20MB while 4MB was moved over to the GPU side giving the GPU 7MB (3+4). It might be possilbe since the GPU is larger..who knows?

I think whether you want to call it eDRAM or not comes down to semantics. For example Xenos has eDRAM because it has some logic built-in? Well if you remove this logic does that mean the RAM isn't eDRAM anymore? Semantics.
 
I think whether you want to call it eDRAM or not comes down to semantics. For example Xenos has eDRAM because it has some logic built-in? Well if you remove this logic does that mean the RAM isn't eDRAM anymore? Semantics.

No, Xenos daugther die is connected by two high speed buses to the main die and the ROP. The 24MB is probably - like the GCN - 1T-SRAM is connected through Northbridge in Hollywood by a measly 2.x GB/s bus.
 
The 24MB is probably - like the GCN - 1T-SRAM is connected through Northbridge in Hollywood by a measly 2.x GB/s bus.

Again we don't know this for a fact so it's a pretty pointless argument. It doesn't need to act as a framebuffer it to be "eDRAM". You could use eDRAM for many things including a high speed cache. When running in "Wii-mode" eDRAM could be used for other purposes. There are many things we don't know.
 
Last edited by a moderator:
According to the pics of Hollywood the 24MB's is not outside of the chips die like it was on the GC.

Also the amount of edram is not really confirmed by anyone. It could be 3MB or 6MB, who knows.
 
Back
Top