What expect from Xenos smart edram except 4xAA

liolio · Apr 10, 2006

I have a very trivial question.
It seems like the tools for predicated tiling on xenos still don't perform well at this time (or engine don't support it from the ground).
What improvements(graphic wize) can we expect from the daughter die if Xbox 360 outputs whithout any AA?
Or this could be a waste of transistor?
By the way, in witch clever manner the smart edram could be used (exept AA) instead (and still taking advantage of the huge bandwith built in)?

Sorry if it's dumb, but most of the posts complain about the lake of AA and don't speak of what could be done instead (if possible and if it's a real devs choice not an issue with implementation of tiling in actual devs tools or engine not built around tiling (I supose this will be fixed sooner or later)).

heliosphere · Apr 10, 2006

The EDRAM handles blending which is effectively free. That's good for particle effects - you can afford to have a lot more blended particles than you could without the EDRAM.

scooby_dooby · Apr 10, 2006

THis could be a great thread if some of the developers would chime in.

MS gave you 256GB/s for backbuffer bandwidth, what will that let you do? will you be able to use all that BW? If so, what for? What impact do does it have on future-proofing the console?

Bobbler · Apr 10, 2006

If you're not using AA then you're kind of wasting the potential and one of the reasons for the eDram (at least in the way its implemented in Xenos).

If you don't use AA, I don't think you really gain much ability to do "other stuff" -- the AA is "free" in that respect. My understanding is that, if you have to tile, then you've effectively paid for the cost of AA.

GB123 · Apr 10, 2006

Would free HDR not be a big plus to the edram ?

Titanio · Apr 10, 2006

GB123 said:
Would free HDR not be a big plus to the edram ?

It's "freeness" is more a consequence of the typical fb format used, fp10, which costs no more in terms of bw than a traditional int8 format.

scooby_dooby · Apr 10, 2006

Titanio said:
It's "freeness" is more a consequence of the typical fb format used, fp10, which costs no more in terms of bw than a traditional int8 format.

Come on now. Lets hear from some people as optimistic about the Xenos architecture as you are about CELL. We don't need any negative ned's in here

You seem to have a great imagination when it comes to CELL/RSX usage. Can you not think of any additonal benefits that the EDRAM may provide(thread topic)?

Titanio · Apr 11, 2006

scooby_dooby said:
Come on now. Lets hear from some people as optimistic about the Xenos architecture as you are about CELL. We don't need any negative ned's in here

That's not being negative, I don't think! I'm simply saying that FP10 is "free" versus a traditional int8 format because it's 32-bit, and thus incurs no more memory footprint or bandwidth requirement than that format.

As for eDram benefits, undoubtedly the largest it was intended for was and is AA. Beyond that, its bandwidth will amply support anything the ROPs handle - alpha blending, z-testing (?) etc. It's all in Dave's article.

scooby_dooby · Apr 11, 2006

Titanio said:
As for eDram benefits, undoubtedly the largest it was intended for was and is AA. Beyond that, its bandwidth will amply support anything the ROPs handle - alpha blending, z-testing (?) etc. It's all in Dave's article.

256gb/s seems like total overkill for simply doing 4xAA no?

We know what the ROP's can do technically(alpha & z-logic), but can we put that into practical terms? What can we expect to see in games that fully utilize the BW available on Xenos?

Titanio · Apr 11, 2006

scooby_dooby said:
256gb/s seems like total overkill for simply doing 4xAA no?

For supporting 4gpixels/sec with 4xAA, that's what ATi figured was required (remember also that no compression is used, I think).

Basically it means you've a bankable amount of fillrate regardless of whether you use 4xAA or not - but to make the most, you'd need to use 4xAA. Your fillrate doesn't go up if you don't, so really it's required if you're to use the eDram as intended.

ERP · Apr 11, 2006

scooby_dooby said:
256gb/s seems like total overkill for simply doing 4xAA no?

We know what the ROP's can do technically(alpha & z-logic), but can we put that into practical terms? What can we expect to see in games that fully utilize the BW available on Xenos?

It's exactly enough to cover the worst case requirements with 4x AA on.

Guden Oden · Apr 11, 2006

heliosphere said:
The EDRAM handles blending which is effectively free.

I don't think so.

On PS2, blending was effectively "free" because that console's eDRAM has separate read and write ports, and alpha blending is a read-modify-write operation. Xenos doesn't have multiported eDRAM.

Just witness the amount of update lag you get in that WWII arcade shooter whatsitsname when you do that ground strafing run during the training missions when the screen is full of stacked explosion textures rendered on top of each other. Doesn't look to me as if blending is an entirely free operation.

Certainly freeER than on external memory, but not completely.

Guden Oden · Apr 11, 2006

scooby_dooby said:
256gb/s seems like total overkill for simply doing 4xAA no?

[...] What can we expect to see in games that fully utilize the BW available on Xenos?

You can't utilize the 256MBs of the eDRAM die if you don't use 4xAA. The connection back to xenos is "only" 32GB/s, so the remainder of the b/w is entirely wasted if AA isn't used.

As for what will effectively use lots of b/w, well, blending and particles have already been mentioned, that'll likely be the best example methinks.

LightHeaven · Apr 11, 2006

some of the first xenos reports also said that Edram would handle stencil shadows, but i never heard any word on this again...

Edit: found one:

The Smart 3D Memory can also compute Z depths, occlusion culling, and also does a very good job at figuring stencil shadows. Stencil shadows are used in games that will use the DOOM 3 engine such as Quake 4 and Prey.

http://www.hardocp.com/article.html?art=NzcxLDM=

joebloggs · Apr 11, 2006

ERP said:
It's exactly enough to cover the worst case requirements with 4x AA on.

So ATI spent 1/3 of the transistor budget basically just for 4xAA?

I't be fine if EVERY single game had 4xAA - atleast then you're getting the benefit of this trade-off but we're seeing a lot of games with no or only 2xAA.

I think that will change when developers implement predicated tiling from the start but how many developers will do that for multiplatform games (and also UE3 doesn't support it).

The worst part is if the 4xAA is not used it then you don't really gain performance elsewhere.

superguy · Apr 11, 2006

joebloggs said:
So ATI spent 1/3 of the transistor budget basically just for 4xAA?

I't be fine if EVERY single game had 4xAA - atleast then you're getting the benefit of this trade-off but we're seeing a lot of games with no or only 2xAA.

I think that will change when developers implement predicated tiling from the start but how many developers will do that for multiplatform games (and also UE3 doesn't support it).

The worst part is if the 4xAA is not used it then you don't really gain performance elsewhere.

Personally I hate the EDRAM and always have. Would rather have all transistors dedicated to logic, and a biger chip than RSX at that.

But ok forgetting that, the IDEA I believe, correct me if I'm wrong, besides AA, is that both companies were essentially limited to 128 bit busses for cost reasons.

By building the EDRAM in, ATI attempted to more or less get around this. All the framebuffer bandwidth is in EDRAM.

More or less, hopefully 360 would theoretically not be much BW limited. While PS3 might be.

The negative tradeoff in Xenos case is less transistors to dedicate to brute force.

So you've in my mind basically got a slightly less powerful chip (Xenos) that can be used 100%, versus a somewhat more powerful chip (RSX) that may be hindered by lack of BW.

How this all plays out is anybody's guess. The final determiner can only be who's games look better imo.

SPM · Apr 11, 2006

I wonder if the reason Microsoft put in eDRAM is to allow for the latency problems in Xenon. Unlike the SPE Xenon relies on cache, and competition between the cores in Xenon for memory bandwidth may prevent Xenon doing what the SPE does without interfering with anything else. Maybe eDRAM is an attempt to prevent unpredictability in Xenon's ability to produce objects for Xenos to render in a timely manner that won't cause frames to be skipped. This was certainly what some early reports suggested was the real reason for it's inclusion with 4xAA being a bonus.

Is the 32GB/s bandwidth of Xenon to Xenos in addition to the 22.5GB/s main memory bandwidth or is it just faster memory accessed by the same bus and which will therefore create contention?

Mintmaster · Apr 11, 2006

Guden Oden said:
Just witness the amount of update lag you get in that WWII arcade shooter whatsitsname when you do that ground strafing run during the training missions when the screen is full of stacked explosion textures rendered on top of each other. Doesn't look to me as if blending is an entirely free operation.

When he says free, he means no more expensive than normal, which blending usually is.

When you have explosions and smoke filling the screen, you have a lot more pixels to draw. That will always slow down framerate. Without the EDRAM, then all those extra pixels would be drawn at half rate to compound the slowdown.

Just because you see a slowdown doesn't mean it's not helping.

Your note about dual-ported RAM is interesting, but you're wrong about thinking that it necessarily affects blending speed. Whether it's dual-ported or not, I don't know, but the internal bandwidth available is enough for full speed blending. If it's not dual-ported, it simply means the internal data transfer rate is half of the peak when you aren't blending. Even z-buffering is a read-test-write operation per sample, by the way.

Mintmaster · Apr 11, 2006

joebloggs said:
So ATI spent 1/3 of the transistor budget basically just for 4xAA?

I't be fine if EVERY single game had 4xAA - atleast then you're getting the benefit of this trade-off but we're seeing a lot of games with no or only 2xAA.

I think that will change when developers implement predicated tiling from the start but how many developers will do that for multiplatform games (and also UE3 doesn't support it).

The worst part is if the 4xAA is not used it then you don't really gain performance elsewhere.

Even without 4xAA you get performance gains.

Say you disable z-buffering and have no textures, and are simply doing alpha blending with the colour buffer. 22.4GB/s is only enough for a maximum of 2.8GPix/s. Add in the z-test, a texture for smoke or fire or whatever, and a less than perfect memory controller, then you drop to well below that. The B3D review of the 7600GT gets ~2.3GPix/s in the single texture alpha blend test without z-test (7600GT has 22.4GB/s BW, and is BW bound in this particular test). The EDRAM allows Xenos to get 4GPix/s.

Without using tiling, 4xAA can be used for low resolution renders like reflections maps, alternate views (TV's, GR:AW type cameras, etc), or rear-view mirrors. The last one would be awesome in a racing game, as AA makes a big difference due to its size.

There are some creative uses of the AA hardware as well that don't need tiling. For omnidirectional depth mapped shadows, you need to store the radial distance from the light in six cube map faces. Using the AA hardware, you can write the FP16 distance values four times as fast, then keep the unresolved values. There's a cool new technique known as variance shadow maps that can benefit from AA hardware also.

There may be a few other similar techniques, but probably not many. IMO, blending is still the biggest advantage, with or without AA.

(NOTE: FP16 shadow map writing may only be two times as fast using 4xAA hardware because I know FP16 HDR rendering is half speed, but for shadow maps you only need one channel - 16-bits total as opposed to 64-bits total - so it should be at full rate.)

Mintmaster · Apr 11, 2006

SPM said:
Is the 32GB/s bandwidth of Xenon to Xenos in addition to the 22.5GB/s main memory bandwidth or is it just faster memory accessed by the same bus and which will therefore create contention?

It's a different bus, so I'd be shocked if there was contention in the way you're talking about. I did hear something about the connection to the CPU having some quirks, though, but I'm not sure about the details.

What expect from Xenos smart edram except 4xAA

liolio

Aquoiboniste

heliosphere

scooby_dooby

Bobbler

Shazbot!

GB123

Titanio

scooby_dooby

Titanio

scooby_dooby

Titanio

ERP

Guden Oden

Senior Member

Guden Oden

Senior Member

LightHeaven

joebloggs

superguy

SPM

Mintmaster

Mintmaster

Mintmaster

Similar threads