Xbox 360: First Look at Final Hardware

Yeah the HUD thing is killing me too. As if every game in every stage of development has to have it. If it's real, it's real.
 
Just to reiterate ERP’s post, previous comments that I made before and what you should be able to derive from the Xenos article, WRT to AA…

The costs associated on the XBOX 360 for AA are, primarily, not where you expect them to be for traditional AA. MSAA on current graphics boards is primarily a bandwidth penalty, a fillrate hit (greater than two samples) and loss of any extra Z sampling rate – you turn on AA you know that your bandwidth is going to be gobbled up (with compression it usually works out at about half the performance on current boards at 4x AA in a graphics limited scenario – but this may actually be due to the fill-rate penalty in many cases). With Xenos the fill-rate is free for all AA modes, the bandwidth on the eDRAM is not going to be an issue, and the resolve for each tile and write to system RAM isn’t going to be significantly greater than the write at that resolution without AA.

Mostly Xenos shifts the AA costs all the way back up through the pipeline and instead of things becoming a fill & bandwidth penalty (primarily, in both cases the pixel samples have to be calculated multiple times for polygon edge pixels) it becomes a geometry hit.

First off for tiling you need a Z pass, which means each frame needs all the geometry calculating once to populate the Z buffer, then it is calculated again for the rendered frame – from what I hear most apps do this anyway, since is also saves a lot on pixel shading costs and other areas, so apps that would have done this anyway this is an extra cost. Where the extra costs come in is calculating the geometry at the edge of a tile – during the Z pass Xenos tags all the commands with what tile the geometry in that command will span, so when it renders the full frame it can discard the geometry outside of the current tile being rendered, however for geometry that spans multiple tiles that geometry needs to be calculated for each tile – the more tiles you have the more poly’s are going to span one or more tiles, hence the more geometry will be calculated needlessly. The costs of this are entirely unpredictable, but overall they probably shouldn’t be that large – the smaller the triangles, the more triangles are going to span tiles and need recalculating but that may still be a relatively low ratio if the entire scene is built from small geometry.

(Note, once the geometry it transformed and mapped to screen space, pixels that are outside of the current tile being rendered should be clipped and discarded which is why things should mainly only be a geometry hit)
 
Dave Baumann said:
Just to reiterate ERP’s post, previous comments that I made before and what you should be able to derive from the Xenos article, WRT to AA…

The costs associated on the XBOX 360 for AA are, primarily, not where you expect them to be for traditional AA. MSAA on current graphics boards is primarily a bandwidth penalty, a fillrate hit (greater than two samples) and loss of any extra Z sampling rate – you turn on AA you know that your bandwidth is going to be gobbled up (with compression it usually works out at about half the performance on current boards at 4x AA in a graphics limited scenario – but this may actually be due to the fill-rate penalty in many cases). With Xenos the fill-rate is free for all AA modes, the bandwidth on the eDRAM is not going to be an issue, and the resolve for each tile and write to system RAM isn’t going to be significantly greater than the write at that resolution without AA.

Mostly Xenos shifts the AA costs all the way back up through the pipeline and instead of things becoming a fill & bandwidth penalty (primarily, in both cases the pixel samples have to be calculated multiple times for polygon edge pixels) it becomes a geometry hit.

First off for tiling you need a Z pass, which means each frame needs all the geometry calculating once to populate the Z buffer, then it is calculated again for the rendered frame – from what I hear most apps do this anyway, since is also saves a lot on pixel shading costs and other areas, so apps that would have done this anyway this is an extra cost. Where the extra costs come in is calculating the geometry at the edge of a tile – during the Z pass Xenos tags all the commands with what tile the geometry in that command will span, so when it renders the full frame it can discard the geometry outside of the current tile being rendered, however for geometry that spans multiple tiles that geometry needs to be calculated for each tile – the more tiles you have the more poly’s are going to span one or more tiles, hence the more geometry will be calculated needlessly. The costs of this are entirely unpredictable, but overall they probably shouldn’t be that large – the smaller the triangles, the more triangles are going to span tiles and need recalculating but that may still be a relatively low ratio if the entire scene is built from small geometry.

(Note, once the geometry it transformed and mapped to screen space, pixels that are outside of the current tile being rendered should be clipped and discarded which is why things should mainly only be a geometry hit)

Thanks for this explanation Dave. Is there anything about tiled rendering or the logic in the daughter die that would make the EDRAM "features" mutually exclusive from each other? How about using more than one of them, does this pile on the performance drop? For example, if the EDRAM applies AA, HDR, DOF ( i have no idea if the EDRAM is responsible for these functions, so bear with me), does each subsequent feature degrade performance in some way or is it just the initial additonal geometry work and thats it?

Seems like 1280x720 2xAA requires 2 tiles and 4xaa requires 3 tiles? Can you venture a guess on what kind of impact this would ahve on a game like PGR3?

J
 
Last edited by a moderator:
expletive said:
Seems like 1280x720 2xAA requires 2 tiles and 4xaa requires 3 tiles? Can you venture a guess on what kind of impact this would ahve on a game like PGR3?

J
Bizarre would like to hit 60fps in PGR3, and are certain they could do so with another 3 or 4 months of work, but currently are at a solid 30fps. A 5% hit for a 3rd title (based on ATIs numbers) could be the difference between locking at 60fps and locking at 30fps if it is too close to call.

For launch games, unless they have had enough time to work out the eDRAM kinks and are well above the 60fps range, taking a 5% hit may be too significant on rushed titles.
 
Acert93 said:
Bizarre would like to hit 60fps in PGR3, and are certain they could do so with another 3 or 4 months of work, but currently are at a solid 30fps. A 5% hit for a 3rd title (based on ATIs numbers) could be the difference between locking at 60fps and locking at 30fps if it is too close to call.

For launch games, unless they have had enough time to work out the eDRAM kinks and are well above the 60fps range, taking a 5% hit may be too significant on rushed titles.

I wonder waht kind of impact the original no-AA to 2xaa would have on a game like PGR3.

J
 
State of Xbox 360

According to one developer who spoke with IGN anonymously, the Final Kits provide a 15%-20% difference in output power, or an increase from 2.8 to 3.2 GHz. "The difference has been huge. The Final Dev Kits have helped tremendously."

IGN learned that all developers have submitted their final code to Microsoft's Quality Assurance team, and the final stages of development - tweaking, polishing, bug-fixing, and optimizing -- has commenced. With an approval stamp from Microsoft's QA team, dev teams can wrap up their games, press gold discs, and begin the manufacturing process. Last, IGN has learned that many Xbox 360 games will ship to retail stores at least one week prior to launch. Meaning that many retailers will receive games 360 games Tuesday, November 15. Will they sell them early too? Several, but not all, publishers had indicated yes.
 
expletive said:
Yes, but whats the performance penalty to go from "no tiles" to "two tiles"?

That depends on quite a few things as outlined by Dave above (does the game already use a Z-prepass, what the average size of the geometry, etc.).

Read http://www.beyond3d.com/articles/xenos/index.php?p=05#tiled and come up with your own conclusion, but what from I gather the minimum performance penalty would be 3-5%. That said, if some devs are not using a Z-prepass, they'll have to rewrite parts of their renderer to take (proper) advantage of tiling, as the Z-prepass tags the command buffers for processing on whatever tiles.
 
Isn't the z prepass pretty much essential to determine which polys to cull? Otherwise they'd just draw everything. Losing this would mean higher costs further down the render pipeline.
 
Shifty Geezer said:
Isn't the z prepass pretty much essential to determine which polys to cull? Otherwise they'd just draw everything. Losing this would mean higher costs further down the render pipeline.

Further costs only when geometry limited i guess, which i don't think the Xenos will be very often if ever.
Bit like PS2, it's just easier and quicker to let the GS render everything and their mother than bothering with culling and stuff.
Not saying the GS and Xenos are similar, but it is true that *sometimes* leaving the horsepower sort things out is better.
 
I dont think there are many engines left which don't do frustum culling ...

(Which is to say that even without laying down the Z-buffer first it is trivial to render tiles, and if you aren't geometry limited then the cost isn't relevant either.)
 
Last edited by a moderator:
london-boy said:
Further costs only when geometry limited i guess, which i don't think the Xenos will be very often if ever.
Bit like PS2, it's just easier and quicker to let the GS render everything and their mother than bothering with culling and stuff.
Not saying the GS and Xenos are similar, but it is true that *sometimes* leaving the horsepower sort things out is better.
Surely then you'd be severly pixel shader limited, rendering maybe 3x as many surfaces as you'll actually ever see. Worse still a city scene with dozen of obscured buildings being rendered without being seen.
 
Shifty Geezer said:
Surely then you'd be severly pixel shader limited, rendering maybe 3x as many surfaces as you'll actually ever see. Worse still a city scene with dozen of obscured buildings being rendered without being seen.

Well i'm pretty sure you'd still only shade what you see, although the geometry is there. Regardless, this won't happen on Xenos cause i think Z-culling is automatic.
 
MfA said:
That doesnt really help when rendering back to front.

Do you expect this to be used on X360? Isn't that something that's fixed in hardware? i thught it was, and that no developer could just "choose" to render back to front or the other way around. I never really understood why anyone would do that in fact, apart from PS2 which i thought could only render that way.
 
It's not so much that you would choose to do it, it's just that unless you are depth sorting you are doing it part of the time purely by chance.
 
MfA said:
It's not so much that you would choose to do it, it's just that unless you are depth sorting you are doing it part of the time purely by chance.

Could you explain to me what are the advantages of both methods? I mean back/front and front/back. I can't find anything about it.
Also, i thought sometimes engines use both, depending on the element they're rendering (say, render some things in the scene back/front and some front/back) but i'm sure i misinterpreted whatever i was reading.
 
london-boy said:
Could you explain to me what are the advantages of both methods? I mean back/front and front/back. I can't find anything about it.
Also, i thought sometimes engines use both, depending on the element they're rendering (say, render some things in the scene back/front and some front/back) but i'm sure i misinterpreted whatever i was reading.
front/back takes advantage of early Z logic (i.e. reduces overdraw). Back to front rendering is needed for translucent / transparent surfaces. Both have the disadvantage that you need to sort by depth (obviously). Most engines do a mixture (e.g. sort objects front-to-back, then sort by renderstate or something).
 
Last edited by a moderator:
Back
Top