MS big screw up: EDRAM

Carl B · Oct 19, 2005

Well, PS2 *has* a dedicated GPU, it's just that GPU has no T&L abilities, and thus the EE gets co-opted.

dukmahsik · Oct 19, 2005

xbdestroya said:
Well, PS2 *has* a dedicated GPU, it's just that GPU has no T&L abilities, and thus the EE gets co-opted.

really, i thought it just had a rasterizer and not a full on gpu.

Carl B · Oct 19, 2005

dukmahsik said:
really, i thought it just had a rasterizer and not a full on gpu.

I see where you're coming from. I'd still term it the system's 'GPU,' even if it doesn't really match the modern description of what a GPU is. We can just go with the older parlance of 'graphics chip' if you like though.

It's lack of hardware T&L is really the demarkation line for it in terms of being 'ancient' vs 'modern.'

dukmahsik · Oct 19, 2005

xbdestroya said:
I see where you're coming from. I'd still term it the system's 'GPU,' even if it doesn't really match the modern description of what a GPU is. We can just go with the older parlance of 'graphics chip' if you like though.

It's lack of hardware T&L is really the demarkation line for it in terms of being 'ancient' vs 'modern.'

yeah, it just goes to show what ps2's edram allows the lil "graphics chip" to do. GT4 is a testament.

Frank · Oct 19, 2005

The GPU needs it's own storage, be that EDRAM or dedicated DDR RAM. That doesn't matter. But if you use off-chip RAM (like on a PC or the PS3), you have high latencies, less bandwidth and so you need on-chip buffers and cache. If you combine the ROP's and other logic that directly needs to access the off-chip local RAM, you end up with a much higher effective bandwith and don't need as much cache and buffers.

So, you have this:

GPU + ROP's + buffers (SRAM) <- (bus) -> cache (SRAM) <- (bus) -> off-chip RAM (DRAM),

or:

GPU + small buffers (SRAM) <- (bus) -> ROP's + on-chip RAM (EDRAM)
^
|
+-----> off-chip RAM (DRAM).

And when you combine RAM with logic, you call it EDRAM, which stands for Embedded Dynamic RAM. It is exactly the same as other off-chip memory (DDR RAM, which is also Dynamic RAM) in all but name. SRAM, which stands for Static RAM, is actually like a very large register file.

DRAM uses a single transistor (with the gate functioning as the capacitor), while SRAM uses four transistors (a flip-flop). So it takes four times as much room. But it is faster, and doesn't need to be refreshed regulary (although that's transparent and handled by the memory modules themselves nowadays).

So, by using EDRAM combined with the ROP's, they actually save lots of die space (transistors), and increase the total bandwith quite a bit.

It limits the available RAM space for the frame buffer, but it increases the things that can be done without penalty, like AA.

Alpha_Spartan · Oct 19, 2005

This thread sucks majorly on a logical level. Bill owned himself with his flawed premise:

False premise: GCN "failed" because it had eDRAM-- You had no proof to back up this statement and also it's a dumb premise. How do you judge failure? Sales? The PS2 had 4MB eDRAM and it sold like gangbusters. Are you basing failure on graphics? The Cube had some of the best graphics this gen. Certainly better than the PS2.

Hell, you'd have a better argument if you said that Xbox 360 would be a roaring success because it has eDRAM since your "proof" (or the lack thereof) does a better job of supporting your point.

one · Oct 19, 2005

DiGuru said:
So, by using EDRAM combined with the ROP's, they actually save lots of die space (transistors), and increase the total bandwith quite a bit.

How about the space of the bus that connects eDRAM and a logic part?

Farid · Oct 19, 2005

I am not closing down this topic because, somehow, an intelligent debate occured...

The thread title is unecessarily sensationalist and the first post lacks of any kind of explanation for this title (GC have eDRAM, JC says, briefly, that something is good or bad, so what?).

Thinking that including an eDRAM pool in a GPU is a bad architectural choice is actually debatable. No problem with that.
Now, there's ton of possible way to discuss this without jumping directly to a conclusion in the thread title.

For the record, I think it's an interesting question, but I also think that it's too early for any conclusion.

At best, we could discuss the amount of eDRAM MS/ATi did choose, seeing how, some launch game engines seem not to be compatible with the tiling needed for the AA, for instance.
Then again, the ROPs circuitry is off-loaded to a daughter making the the GPU cheaper to produce when compared to an IC in one die, and it also give the GPU a huge BW. Which is clearly a positive point.

At this point of time, the eDRAM in Xenos seems to be an interesting choice (positively), from a technological POV, and from the benefits it brings (Free AA most of the time, huge BW).

scooby_dooby · Oct 19, 2005

Vysez said:
At best, we could discuss the amount of eDRAM MS/ATi did choose, seeing how, some launch game engines seem not to be compatible with the tiling needed for the AA, for instance.

From what I understand, MS was pushing Dev's to code in predicated tiling into their engines, but since there was no comparable hardware at the time, and nothing to test on, many Developers decided they didn't want to take the risk.

They could've put in so much EDRAM that tiling was not required at 1080i w/ 4xAA, but would that have been the best solution? Surely the capabilities in the GPU would have been reduced in other areas to compensate for the increased transitors used for the EDRAM. If you can implement tiling that results in only a 5% performance hit, and get away with 1/3 of the transistors isn't that the ideal solution?

Thowllly · Oct 19, 2005

I think edram is a good idea, not a mistake. For those who see it as 80 million transistors wasted, remember that you can pack ram much tighter on the die than other logic. The area used for 80 million edram transistors might only be enough for 20 million transistors of normal logic gates. And its easier to have high yields for ram, because its easier to build in redundancy in a ram array. Also, its easier to split a gpu with edram over two dies (improves yields further) efficiently than splitting a normal gpu over two dies. (A 300 million transistor gpu is better than two 150 million transistor gpus SLIed together)

But I wish they had included a little more edram. 16MB would have been enough for 1080p without AA, or lower resolutions with various degrees of AA. Like 1080i field rendered with 2xAA (I don't like field rendering that much, but Dave's article claims it will be used), 2xAA for 720p, 4xAA for 576i/p, 6xAA for 480i/p. Considering how old the GS with 4MB is, I almost would have expected the amount of edram to have doubled 3 times to 32MB, I really don't think that just doubling it twice to 16MB is asking too much, as it is now it only has 2.5 times as much edram.

scooby_dooby · Oct 19, 2005

An interesting sidenote is that UE3.0 was NOT built with predicated tiling in mind, meaning all the UE3 based games won't be able to make use of the EDRAM as effeciently as possible.

So...when's UE4 coming out?

Carl B · Oct 19, 2005

scooby_dooby said:
An interesting sidenote is that UE3.0 was NOT built with predicated tiling in mind, meaning all the UE3 based games won't be able to make use of the EDRAM as effeciently as possible.

So...when's UE4 coming out?

Well, and that leads me back to a question I had in the other AA thread: with a number of engines designed to be multi-platform, how much effort will go into including predicated tiling in the engines of the multi-platform games that will no doubt form a large part of what is available this gen?

Another way of phrasing it might be: among next-gen multi-console engines, how easy/costly will it be to include predicated tiling for the 360 version of the engine development-wise?

@Thowllly and Diguru: Great posts by the way.

Dave Baumann · Oct 19, 2005

AFAIK the Z pre-pass and tagging of the commands are API calls, the software / hardware should take care of most of the operations.

Frank · Oct 19, 2005

one said:
How about the space of the bus that connects eDRAM and a logic part?

You need to have that bus anyway to connect the off-chip RAM, although you could use a smaller one. But you would need cache as well, which takes more space.

Then again, if you want the same bandwith, you need large caches and that wide bus anyway. So overall you need less transistors, although it is debatable if you should count the EDRAM with the GPU or DDR RAM transistors. In any case, you get 10 MB very high speed memory almost for free.

As for the unified memory + EDRAM architecture versus split memory (half dedicated to the GPU, the other half to the CPU's): unified carries a larger latency panelty for texture lookups, but allows the developer to use it as he sees fit, while with split memory you don't have that choice, but you have less texture latencies and a larger frame buffer. It's a toss-up for now what is better.

As games become more shader heavy, the texture latencies are becoming less and less important. But for games that use many different textures and other maps, that latency and the associated bandwiths are important. So it depends on how they do it what is best.

All in all, it probably doesn't matter very much as long as you develop only for that single system. They're all fine.

ShootMyMonkey · Oct 19, 2005

Xenos' daughter die circumvents this with it's tiling scheme - certainly I'm sure Kutaragi would have loved to implement some eDRAM if it had been at all practical.

All the same, Xenos' eDRAM had to be big enough to hold the framebuffer + Z-Stencil and room for some reasonable size tiles. Doing that for dual 1080p would mean at least 32 MB even without leaving some room for the tiles to do their work.
That said, I still think that given the TRCs, the eDRAM in 360 should have been at least 12 MB...

while SRAM uses four transistors (a flip-flop). So it takes four times as much room.

4-6 depending on the particular implementation, although SRAM cells are definitely about as dense of an IC as you get. Also, it should be noted that the 4-6 transistors are for single-ported memories. if you want to have multiple read/write ports, you basically have to multiply it out.

But it is faster, and doesn't need to be refreshed regulary (although that's transparent and handled by the memory modules themselves nowadays).

Yeah, it's transparent, but that refresh is a contributor to the latencies of DRAM.

It limits the available RAM space for the frame buffer, but it increases the things that can be done without penalty, like AA.

I'm still on the fence about the "zero penalty" AA... Not quite ready to buy that yet, especially when the enabling/disabling of AA marks the enabling/disabling of predicated tiling. There are just too many balls dropped with every piece of hardware in every one of the next-gen consoles. Even my own reservations aside, there has to be a reason why nobody is doing it, and instead resorting to substitutes.

scooby_dooby · Oct 19, 2005

xbdestroya said:
Another way of phrasing it might be: among next-gen multi-console engines, how easy/costly will it be to include predicated tiling for the 360 version of the engine development-wise?.

Ya that's a great question, and it applies also to the SPE's in CELL. How many 3rd party's will spend the time to code everything specifically for SPE's, or will they just use the baseline 1PPE.

My guess would be that you'll see Dev's support one platform as their base, and extract as much power as possible from that platform, then try and "make it work" on the other console when porting over.

If a Dev has chosen X360 as a base platform, they'd be crazy not to implement tiling into their engine and take advantage of the massive bandwidth available within the EDRAM.

Carl B · Oct 19, 2005

scooby_dooby said:
Ya that's a great question, and it applies also to the SPE's in CELL. How many 3rd party's will spend the time to code everything specifically for SPE's, or will they just use the baseline 1PPE.

My guess would be that you'll see Dev's support one platform as their base, and extract as much power as possible from that platform, then try and "make it work" on the other console when porting over.

I more or less agree - I could see a sort of hybrid baseline evolving around one PPE core and traditional rendering techniques, which unfortunately may leave a fair bit on the table for both systems as far as utilizing the finer points of the hardware.

On the side, to the devs in the place, do dev houses normally try to make a game look the same cross-console if it's intended to be multi-sku from the start? Or do they attempt to extract a performance edge from given hardware if the effort to do so is minimal? Just wondering what the mentality behind a game like Madden is for instance when it comes to cross-console development.

scooby_dooby · Oct 19, 2005

It really depends on the develop i would say. If you take a look a GTA: SA, it's obvious it was a PS2 game from day 1, and they did basically nothing for the XBOX version, just basically a straight port with longer draw distance. Then if you take a look at Splinter Cell 3, it was clearly a game designed to extract ultimate performance from the XBOX, and the PS2 port was nowhere near as nice. Then look at EA, and everything pretty much looks identical on both systems.

Bill · Oct 19, 2005

People here aren't seeming to get it.

I think EDRAM is a DISADVANTAGE plain and simple.

Why it's not in PS3 is probably because they were SMARTER and used their resources BETTER.

I've seen nothing to convince me of the value of EDRAM. Bandwidth is not a problem at 720P and motion blur is not worth 1/3 the GPU.

Apoc · Oct 19, 2005

Bill said:
People here aren't seeming to get it.

I think EDRAM is a DISADVANTAGE plain and simple.

Why it's not in PS3 is probably because they were SMARTER and used their resources BETTER.

I've seen nothing to convince me of the value of EDRAM. Bandwidth is not a problem at 720P and motion blur is not worth 1/3 the GPU.

As people have told you, you don't know what you're talking. And, in fact, x360 design is smarter than a traditional GPU.

MS big screw up: EDRAM

Carl B

Friends call me xbd

dukmahsik

Carl B

Friends call me xbd

dukmahsik

Frank

Certified not a majority

Alpha_Spartan

one

Unruly Member

Farid

Artist formely known as Vysez

scooby_dooby

Thowllly

scooby_dooby

Carl B

Friends call me xbd

Dave Baumann

Gamerscore Wh...

Frank

Certified not a majority

ShootMyMonkey

scooby_dooby

Carl B

Friends call me xbd

scooby_dooby

Bill

Apoc

Similar threads