EDRAM in GPUS

Briareus · May 16, 2005

Can someone explain the advantages and disadvantages of embedded DRAM? Why is it used in several consoles but not in any of the PC based GPUs?

Laa-Yosh · May 16, 2005

Up until now, there weren't proper methods to handle a frame buffer that couldn't fit into the EDRAM you could reasonably put into a GPU.
Since current gen consoles only had to support resolutions up to 640*480@32bit, their framebuffer was small enough to even spare some EDRAM for texture memory.

Bitboys has been working on several iterations of EDRAM-based GPUs but their attempts never reached the market. As far as I now the main reason was that they couldn't manufacture a chip with enough EDRAM. So it was a matter of timing - how soon will we have a manufacturing process that can give us enough on-die memory for the currently used screen resolutions?

Now that ATI has developed methods to support larger resolutions, it's quite possible that they'll leverage this technology into their highend videocards as well.

KimB · May 16, 2005

The other reasons are:
1. You have to sacrifice processing power to add in the RAM (which makes this only viable if you would have pretty bad efficiency otherwise...such as if you had decided to make a tradeoff of eDRAM vs. expensive external memory).
2. eDRAM makes it a bit more challenging to clock the parts as high, so you lose some fillrate again for implementing it.

So, in the end, high-end PC parts are about balls-to-the-walls performance, whereas economics is a much larger concern for a console. It makes sense to just make a more complex chip for the console, because economies of scale and possible future die shrinks will help to combat the added cost of the eDRAM, but more powerful memory results in more complex PCB's and more expensive memory chips (which already have high economies of scale), and thus it often makes more sense to use eDRAM on the console.

Blazkowicz · May 16, 2005

on the low end PC cards you now have Turbocache and the like, this very loosely looks like eDram to me

bloodbob · May 16, 2005

Blazkowicz_ said:
on the low end PC cards you now have Turbocache and the like, this very loosely looks like eDram to me

Ehh seems like quite the oposite to me.

ddes · May 16, 2005

Laa-Yosh said:
Up until now, there weren't proper methods to handle a frame buffer that couldn't fit into the EDRAM you could reasonably put into a GPU.
Since current gen consoles only had to support resolutions up to 640*480@32bit, their framebuffer was small enough to even spare some EDRAM for texture memory.

Note that the X360 maximum resolution is 1280x720. At that resolution you can fit a 32-bit front and back buffers and a 24-bit Z-buffer into the EDRAM.

Laa-Yosh said:
Bitboys has been working on several iterations of EDRAM-based GPUs but their attempts never reached the market. As far as I now the main reason was that they couldn't manufacture a chip with enough EDRAM. So it was a matter of timing - how soon will we have a manufacturing process that can give us enough on-die memory for the currently used screen resolutions?

I do remember that there was a silicon prototype with 12 MB of memory on-chip?

jpr27 · May 16, 2005

Chalnoth,

Wouldnt EDRam be faster then external Ram? I thought fetching information (in this case textures) would be much faster on die then having to fetch from an external memory? Could you clarify this for me?

Thanks

Laa-Yosh · May 16, 2005

ddes said:
Note that the X360 maximum resolution is 1280x720. At that resolution you can fit a 32-bit front and back buffers and a 24-bit Z-buffer into the EDRAM.

That's right, but... have I said anything that contradicts this? Is the X360 a current-gen console?

Jawed · May 16, 2005

[Might as well post what I wrote in another thread here, too]

It's looking pretty certain now that ATI's GPU + EDRAM architecture for R500 consists of 2 chips.

I dare say the key thing was to create an architecture for a GPU which enables this split, in order to free the GPU from otherwise being restricted, as Chalnoth says, by on-die EDRAM.

ATI's patents on this architecture go back to 1998, with the current form determined in 2000. I dare say it's been a matter of waiting until such an architecture can meet the constraints of the PC gaming business, with the support for legacy games and coding techniques creating a sizable overhead in GPU resources.

We don't know what the die area for 10MB of high performance EDRAM is - we only know that low-power EDRAM would consume about 225mm squared at 90nm. (Bad memory alert: breakfast still settling, maybe it's 150mm squared, anyway whatever it is, it's a big package).

Looking forwards to a PC GPU with EDRAM, it would prolly need 32MB of RAM to cater for the high end PC resolutions. Either that or it would be forced to settle for rendering a frame in portions (I won't call them tiles, because we're talking about a half or quarter of the frame).

Jawed

DemoCoder · May 16, 2005

What are you smoking Jawed? XB360 eDRAM is on a separate chip? That's a contradiction in terms. Besides, to get 256gb/s bandwidth from an external chip, you'd need an insanely wide bus and clock rate. 512-bit bus and 4Ghz RAM, or 1024-bit bus and 2Ghz ram, or 2048-bit bus and 1Ghz ram. Try to imagine the chip packaging and PCB layout for a 1024-bit bus.

Either it's onchip, or it's not embedded. Unless you think they are using that Proximity Bus technique that Sun is pursuing, or some kind of optical link. It's just beyond reason.

Laa-Yosh · May 16, 2005

Or the 256GB/sec is some multiplied value of the real bandwith, because of compression/optimiziation?

BTW I think it's a seperate piece of silicon, but in the same package, much like the first PPros and their cache.

Inane_Dork · May 16, 2005

ddes said:
Note that the X360 maximum resolution is 1280x720. At that resolution you can fit a 32-bit front and back buffers and a 24-bit Z-buffer into the EDRAM.

The X360 is not limited to that, and why would you possibly want to store the front buffer in cache anyway?

KimB · May 16, 2005

jpr27 said:
Chalnoth,

Wouldnt EDRam be faster then external Ram? I thought fetching information (in this case textures) would be much faster on die then having to fetch from an external memory? Could you clarify this for me?

Thanks

Well, yeah, but the problem is: what if external memory isn't your primary limiter of performance? Improving memory performance in this scenario won't help you much. But adding in eDRAM will end up reducing your fillrate (die size kept the same), and thus isn't going to be a good solution much of the time.

Nappe1 · May 16, 2005

ddes said:
Laa-Yosh said:

Up until now, there weren't proper methods to handle a frame buffer that couldn't fit into the EDRAM you could reasonably put into a GPU.
Since current gen consoles only had to support resolutions up to 640*480@32bit, their framebuffer was small enough to even spare some EDRAM for texture memory.

Click to expand...

Note that the X360 maximum resolution is 1280x720. At that resolution you can fit a 32-bit front and back buffers and a 24-bit Z-buffer into the EDRAM.

Laa-Yosh said:

Bitboys has been working on several iterations of EDRAM-based GPUs but their attempts never reached the market. As far as I now the main reason was that they couldn't manufacture a chip with enough EDRAM. So it was a matter of timing - how soon will we have a manufacturing process that can give us enough on-die memory for the currently used screen resolutions?

Click to expand...

I do remember that there was a silicon prototype with 12 MB of memory on-chip?

two different revisions in fact...

EDIT:
okay, so deal with Bitboys eDRAM system was not to have whole back buffer at once in eDRAM. the scene was split in the tiles and only tile being rendered was needed to fit in eDRAM. On case of Matrix Anti-Aliasing being enabled, the AA was applied during eDRAM -> back buffer transfer. (The guy who was working on the rasterizer implementation of this chip is one of the regulars here, but so far he have decided not to show this side of his talents here. so, I am not going to tell you who he is. it's up to him, if he decides so.)

Images above show two different revisions of chip codenamed AXE. It has DX8 feature set VS 1.0 and PS 1.1 with 4 pipelines and 2 tmus per pipe. planned clocks were 175MHz Core / 175 MHz memory. If everything would have gone like planned, AXE would have been released As Avalanche 3D in christmas 2001. The chip is capable working as dual mode as well, so Avalanche Dual would have had around 46GB/s memory bandwidth and 8 dx8 pipelines.

after this and before moving hand held / pda side, Boys had another project called Hammer, which had some interesting things coming. it had eDRAM too, but only 4 MB and it incorporated their own occ. culll. tehcnology. all technology ment to be in hammer was licenseable after the project died and someone was interested enough at least their occ. culling technology, because all material relating to it was removed from their website soon after adding it to there. Only thing I heard was that it was removed because customer wanted it to vanish. at least so far I have no idea who the customer was.

so, need of eDRAM to fit whole frame buffer on it? no, I don't think so. all new cards already works now on pixel quads because of several reasons and ATI has even higher level Super Tiling that is used for big off line multicore rendering sollutions. As long as we can divide screen space to smaller parts, what's the reason for keeping frame buffer as one big rendering space? everytime the render finishes the tile, it's takes a small time before next one starts and there's no traffic in external memory bus during that time, so you could basically use that time for moving finished tile from eDRAM to frame buffer.

Jawed · May 16, 2005

I posted this message on Saturday:

http://www.beyond3d.com/forum/viewtopic.php?p=519513#519513

In an interview, Rick Bergman, senior vice-president and general manager of ATI's PC Group, said the XBox 360 will contain an ATI- designed graphics processing unit, the 360 GPU, as well as a companion-memory chip.

As well as that, if you read the relevant patents you will see that the Raster Output architecture that ATI has put together is designed around EDRAM for the back frame buffer's pixels only (i.e. excluding AA samples). AA sample data is not kept in EDRAM, because it is too voluminous.

It all adds-up to a GPU architecture in which EDRAM shares a die with a blend/filter/query unit, pipelined in a loop with the GPU so that the overall ROP is unaffected by the latency of fetching AA sample data from (slow) local memory (system memory in XBox 360).

The GPU shares some of the ROP workload, generating/blending AA samples as instructed by the EDRAM blend/filter unit and fetching/writing AA samples to local memory.

The GPU and the EDRAM unit work on different fragments/AA sample data, with the GPU both feeding the EDRAM's pipeline and accepting the results back from that pipeline that need further work.

Jawed

Guden Oden · May 16, 2005

Laa-Yosh said:
Or the 256GB/sec is some multiplied value of the real bandwith, because of compression/optimiziation?

Yeah, it IS a multiplied value, just like Nvidia's claim of 64GB/sec bandwidth for NV30 due to 4x framebuffer compression, way back when.

At first you'd think MS would have learned not to lie like that, but then you have to remember they're aiming for Joe Consumer with this number and those marketroids don't pull any punches when it comes to making the goods they're pushing look as good as possible. Lies or not.

loekf2 · May 16, 2005

Jawed said:
[Might as well post what I wrote in another thread here, too]

We don't know what the die area for 10MB of high performance EDRAM is - we only know that low-power EDRAM would consume about 225mm squared at 90nm. (Bad memory alert: breakfast still settling, maybe it's 150mm squared, anyway whatever it is, it's a big package).

See my other post. If NEC puts the right info on their website it shouldn't be more than 0.22 um2 / cell = 1 bit, which gives you less than 20 mm2 for 10 MB.

This could sound small, but remember it's basically one trannie / bit, so much smaller than SRAM.

ATI could have done two things for the R500: put the DRAM on die or put it into the package. Even inside the package has speed advantanges (next to power consumption). From the NEC press release we've seen Microsoft opted for on-chip DRAM and the GPU is hence manufactured at NEC.

loekf2 · May 16, 2005

Jawed said:
I posted this message on Saturday:

http://www.beyond3d.com/forum/viewtopic.php?p=519513#519513

In an interview, Rick Bergman, senior vice-president and general manager of ATI's PC Group, said the XBox 360 will contain an ATI- designed graphics processing unit, the 360 GPU, as well as a companion-memory chip.

Click to expand...

Hmm... companion chip ? Why is it then called eDRAM in the first place ?

Back to plan B.... two dies into one package ? I don't think 10 MB is that large to put it onto a seperate die.

Dave Baumann · May 16, 2005

DC, its not that widely known yet, but yes, the eDRAM is a separate chip - there is the shader core (produced by TSMC) and then sitting alonside it, but on the same package, is the eDRAM chip produced by NEC. "eDRAM" is probably not the right term, although the ROP's are in here, not the shader core, they are probably dwarfed by the silicon of the memory itself. This is another explaination why the memory bus width is 128-bit rather than 256-bit - it has to deal with the pads for connections to both the eDRAM, host, and main memory.

nAo · May 16, 2005

This explain also why edram bandwith is so 'low', PS2 GS edram has the same bandwith (48 GB/s) as R500, but in year 1999 on a 0.25 um at 150 Mhz.

EDRAM in GPUS

Briareus

Laa-Yosh

I can has custom title?

KimB

Blazkowicz

bloodbob

Trollipop

ddes

jpr27

Laa-Yosh

I can has custom title?

Jawed

DemoCoder

Laa-Yosh

I can has custom title?

Inane_Dork

Rebmem Roines

KimB

Nappe1

lp0 On Fire!

Jawed

Guden Oden

Senior Member

loekf2

loekf2

Dave Baumann

Gamerscore Wh...

nAo

Nutella Nutellae

Similar threads