Xbox 360 eDRAM. Where are the results?

Not a big deal, if I correctly understand what they're saying. Accumulating post-processed bloom really just lets you avoid a single texture fetch per pixel, which is a negligible savings (e.g. 30fps -> 30.2fps). Maybe you can save a resolve too if you sample the previous frame for the bloom, but you're still talking about ~1%.
 
I like particles. Whole games can look fancy just with judicious particle use. Naff particles can really make a game look bland too. Smoke and fire that looks like so many billboarded textures just detracts from the experience no matter the game.
 
I like particles. Whole games can look fancy just with judicious particle use. Naff particles can really make a game look bland too. Smoke and fire that looks like so many billboarded textures just detracts from the experience no matter the game.

I totally agree!

particles give that extra level of atmosphere!
 
Though this generation edram is not going to be as relevant as it was in the last gen (especially if you can't use it to store textures..) the main advantage behind it is the ability to have a huge sustained fill rate in simple rendering passes (particles are the most obvious candidate here..)
So we start to observe many 360 games doing an obscene (in a good way!) use of particles effects..there you have your proof :)

I don't think being able to draw lots of alpha particles is the main benefit of EDRAM, it's surely a good thing. And I don't think EDRAM is there to provide effects that can't be implemented as efficiently on another architecture.
I see the EDRAM as a way to basically remove fillrate to the framebuffer from the equation, when you are looking for bottlenecks: it'll never ever be a bottleneck. It's also a way to have a more predictable use of the main memory. It's basically necessary, in my opinion, to overcome the drawbacks of the first implementation of UMA on the first xbox, where the GPU could starve the CPU for memory accesses when rendering.

The entire architecture is based around the EDRAM, without EDRAM it would just have been substiantially different, not necessarily better or worse, but different.

Of course you pay the price of having a nice and simple UMA, with no fillrate bottlenecks, "free" MSAA and less pressure on bandwidth to the main ram when doing MSAA (you still have to copy the backbuffer to main ram, but you only copy the resolved version) with having to deal with PTR and all its pitfalls.

At the end of the day, it's a design tradeoff. And it works ok in my opinion.
 
Fran said:
to overcome the drawbacks of the first implementation of UMA on the first xbox, where the GPU could starve the CPU for memory accesses when rendering.
That's a reality of any shared memory architecture (ie. every console since DC). IMO for eDram to serve the purpose you describe, it needs to be general purpose storage, not limited to FB only - that way you kill render to texture bandwith consumption + you give application the ability to explicitly control how much GPU will hog the shared bus.

Unfortunately no consumer hw actually addresses this scenario properly (360 has the aforementioned usage limitations, GC had the stupid split banks, PS2 limited access, PS3 has the right idea but the VRam bank isn't eDram, and PSP, well PSP comes closest on paper but its ultimately let down by stupidities I won't talk about here).
 
That's a reality of any shared memory architecture (ie. every console since DC). IMO for eDram to serve the purpose you describe, it needs to be general purpose storage, not limited to FB only - that way you kill render to texture bandwith consumption + you give application the ability to explicitly control how much GPU will hog the shared bus.

Unfortunately no consumer hw actually addresses this scenario properly (360 has the aforementioned usage limitations, GC had the stupid split banks, PS2 limited access, PS3 has the right idea but the VRam bank isn't eDram, and PSP, well PSP comes closest on paper but its ultimately let down by stupidities I won't talk about here).

I agree with you. Performance-wise, a read-write EDRAM would have been the best choice. I think cost is the problem here: it would have probably been too expansive, so not an option.
I'm also not convinced that a read-write EDRAM would have been easier to work with. Don't know, I don't have a clear idea on this point: I kinda like the idea of having all my textures there and my rendertargets here, and a clear and simple path to go from one to the other. I find it elegant, it appeals to my "keep-it-simple" vision of the world.
 
Though this generation edram is not going to be as relevant as it was in the last gen (especially if you can't use it to store textures..) the main advantage behind it is the ability to have a huge sustained fill rate in simple rendering passes (particles are the most obvious candidate here..)
So we start to observe many 360 games doing an obscene (in a good way!) use of particles effects..there you have your proof :)

Lost Planet is not a good example?:D
 
That's a reality of any shared memory architecture (ie. every console since DC). IMO for eDram to serve the purpose you describe, it needs to be general purpose storage, not limited to FB only - that way you kill render to texture bandwith consumption + you give application the ability to explicitly control how much GPU will hog the shared bus.
The massive eDRAM BW is only available to the internal logic (ROPS). For arbitary use, you'd either need the eDRAM directly connected to the shader pipes, or connected by a massive bus. Ultimately, why not just go with fast RAM on a fat pipe, like GPUs? Unless you can actually get 256 GB/s between eDRAM and the GPU logic, I can't see a point to eDRAM.
 
Unless you can actually get 256 GB/s between eDRAM and the GPU logic, I can't see a point to eDRAM.
I think Xenos, even given its limitations, automatically explains the point -> devs don't have to worry about frame buffer bw. Some of us have nightmares populated with fillrate eating particles monsters :LOL:
 
I think Xenos, even given its limitations, automatically explains the point -> devs don't have to worry about frame buffer bw. Some of us have nightmares populated with fillrate eating particles monsters :LOL:

So is it fair to say that Xenos should be capable of particle effects that not even R600 (assuming around 128GB/s bandwidth) could pull off?
 
So is it fair to say that Xenos should be capable of particle effects that not even R600 (assuming around 128GB/s bandwidth) could pull off?
I think R600 will/would destroy any next gen consoles at particles rendering, it supposedly has huge raw bandwidth and color compression as well!
 
I think R600 will/would destroy any next gen consoles at particles rendering, it supposedly has huge raw bandwidth and color compression as well!

Does colour compression really make that much of a difference? How much real world improvement in bandwidth usage are we likely to see over an implementation without compression? Im assuming well over 2x based on your comment?
 
The massive eDRAM BW is only available to the internal logic (ROPS). For arbitary use, you'd either need the eDRAM directly connected to the shader pipes, or connected by a massive bus. Ultimately, why not just go with fast RAM on a fat pipe, like GPUs? Unless you can actually get 256 GB/s between eDRAM and the GPU logic, I can't see a point to eDRAM.
The point is that the designers found a point in the 3D pipeline where the data flow can be a lot narrower than final FB bandwidth. With only a 16GB/s connection between the dies, Xenos can write up to 256GB/s to the memory. The parent die sends the coverage, Z, and colour information to the daughter die in 2x2 pixel groups. The daughter can then read and write the Z and colour for each sample.

Ideally you want the eDRAM to be part of the same die so you can write to it with massive BW, but that wasn't an option here due to the capabilities of the fabs, so they did the next best thing.

As for the fat pipe to RAM, that's not an option for consoles if you want to shrink the thing down in the future as much as possible. Those additional 16-256 GB/s cost relatively few pins compared to the additional 25 GB/s from double width RAM.
 
Does colour compression really make that much of a difference? How much real world improvement in bandwidth usage are we likely to see over an implementation without compression? Im assuming well over 2x based on your comment?
Colour compression's is only non-negligible when AA is enabled, and then it's very effective. You might only see 10-20% more real data flow with 4 times the samples (and hence backbuffer size). So whatever the AA level is, the improvement will be just a bit lower.

It won't change the fact that you have to read data to do transparent particles, however. Bandwidth demand will still double with alpha blending enabled, regardless of whether compression is enabled or not.
 
Faf: what then, would be the ideal memory architecture for a future console? in terms of emedded RAM and main memory. curious since next-gen consoles are in R&D right now.
 
I like particles. Whole games can look fancy just with judicious particle use. Naff particles can really make a game look bland too. Smoke and fire that looks like so many billboarded textures just detracts from the experience no matter the game.

Yep. Zone of the Enders2 wouldnt have been the same without the crazy amounts of particles it could display! :smile:
 
Colour compression's is only non-negligible when AA is enabled, and then it's very effective. You might only see 10-20% more real data flow with 4 times the samples (and hence backbuffer size). So whatever the AA level is, the improvement will be just a bit lower.

It won't change the fact that you have to read data to do transparent particles, however. Bandwidth demand will still double with alpha blending enabled, regardless of whether compression is enabled or not.

Cheers, so then I guess with 128GB/s the R600 would actually have more effective bandwidth available than Xenos with its eDRAM when using 4x MSAA because with Xenos the data flow is quadrupled while it goes up less than double with R600 due to compression (depending on the level of transparent particles being used)?

Thats assuming of course that data flow quadruples without compression when you add 4xMSAA, or am I way off there?
 
Even R580/G71 will have more effective bandwidth than XB360 most of the time. Only in the toughest scenarios (from a BW point of view) would it be possible for XB360 to come out on top.
 
Back
Top