PDA

View Full Version : What *exactly* is the cost of Xenos AA?


Shifty Geezer
18-Oct-2005, 17:49
In the console games section this debate is back, and it seems people are still unclear exactly how much the 'free' 2xAA of Xenos costs. Afterall it's called free, so surely it comes at no extra cost whatsoever to the render pipeline? The GPU can render the same scene with 2xAA without expending any extra effort or taking any longer than without AA? That's not actually the case, is it? As I understand it the pixel shaders are applied once per pixel, but there is 2 vertex samples per pixel, plus a 5% or whatever it is tiling overlap. The actual sample averaging can be considered 'free' as that's handled by the eDRAM logic. So the actual cost of 2xMSAA on Xenos over normal no AA rendering is twice the Vertex transform work + '5%' for the tiling as an AA'd buffer dosen't fit neatly in eDRAM (FP10 HDR being considered as the color data type). The term 'free' is only true in the sense of bandwidth consumption. On a typical GPU the same AA processes are involved, but the hit on BW due to blending is where the price is really paid and where the eDRAM provides it's unique benefit. Right?

Can some really knowledgable person post up a definitive coverage of what's involved in simple terms that people can hear, understand, and not forget? I'm ashamed that this far after hearing all about Xenos I'm still hazy as to what 'free' AA really means, and I don't think I'm alone here :oops:

Xenus
18-Oct-2005, 18:18
Wouldn't this mean if a game became vertex transform limited Xenos's AA would no longer be able to be used. Though with its USA I don't see this happening much.

lip2lip
18-Oct-2005, 18:35
IMHO, I would consider cost to xbox 360 in this issue as a measure in ns, and not chip computation costs, mostly because most work is done off chip, and somewhat because cycle counting is mostly used for multicore environments, as any program can tell you.

I believe total latency should be low, because of low latency nature of embedded mem.

All hail 360!

ERP
18-Oct-2005, 18:48
There have been so many posts on this and several actual answers.

There is no way to generalise X% overhead.

It requires you use tiling for HD resolutions, this in turn requires that the entire display list is available to the Graphics Chip which may or may not have significant memory implications depending on how you do it.

Any batch of primitives that straddles multiple tiles will be transformed for every tile it's in, although pixel shading cost will be no extra, and there will be one extra sync of the graphics pipe for every tile rendered.

You can likely construct artificial cases that are close to 0 overhead and artificial cases that have large overhead, the actual penalty is going to be very game dependant. It's going to be a lot cheaper in almost any case than current PC architectures.

Rolf N
18-Oct-2005, 19:00
Disclaimer: I base this solely on the public published information that's available for Xenos, most notably this B3D article (http://www.beyond3d.com/articles/xenos/index.php?p=05#tiled). I have no developer manual and certainly no dev kit.

In terms of computational resources, Xenos' AA really is absolutely and honestly free.
Per "pixel", Xenos computes one color, a depth gradient and determines a subpixel coverage mask. The maks is just four bits. The depth gradient is sufficient because all potentially covered subpixels, while they have variable depth values, are from the same triangle. Hence this connection doesn't need much bandwidth (actually less bandwidth than an equivalent PC part's road to memory because blending is also "free", even without any AA).

Inside the daughter die, the color and z values are replicated to all covered samples according to the mask bits, z test is done and blending is done. So in the worst case, for one incoming pixel the daughter die needs to read/modify/write four subpixel depth values (for a subpixel-precise depth test) and read/modify/write four subpixels' colors (for blending). The eDRAM daughter die has very high internal bandwidth and can cope with this all just fine, and this is exactly the reason why this is deemed "free".

The catch:
The eDRAM daughter die, while having that very high bandwidth, only has limited storage space. Rendering in high resolutions with AA will exhaust this space. Doing 4xMSAA requires four times as much space to be set aside than rendering without AA. If you don't have that space, you can't hold the whole backbuffer at once.

The proposed solution is to split the scene up into tiles that do fit the eDRAM space limits.
E.g. instead of rendering a complete 1280x720 w 4xMSAA frame (which you can't), the rendering process can be split up into three 1280x240 partitions (roughly ~9,8MB each) which are rendered sequentially. You flush out the finished partitions to system memory to make room for the next one. If you have all three partitions down in system memory the frame is done, you can point the RAMDAC there to scan it out and start building up the next frame in the same way.

But this is not free. Rendering the whole scene in one go is more efficient. Now let me base the explanation on a regular PC GPU (IMR) for the moment:
You'll have to resubmit and hence retransform at least some geometry. You should not need to render a triangle in partition 1 if you know it will only be visible in partition 2, but a triangle that overlaps two partitions will have to be rastered twice for correct results. "Knowing" is a problem though. For determining resubmission at the triangle level you'd need impractical amounts of (CPU) preprocessing, so you'll end up resubmitting huge gobs of geometry, if not your entire scene geometry, three times for three partitions. I.e. you do three times the vertex processing work and three times the trisetup work as opposed to non-partitioned rendering. As the submission isn't free either, you'll also pay three times the geometry bandwidth costs (shared system memory in case of Xenos) and three times the CPU costs associated with traversing the scene graph, setting up render states and queuing up the draw calls to the hardware. Overall, the only cost that doesn't triple is fillrate (because the partition has less pixels than a whole frame).

There are multiple conceivable ways how Xenos could assist this partitioned rendering process at the hardware level.
1)Xenos might well support nothing more than a reconfigurable viewport, i.e. nothing worth talking about. The same costs as set forth in the above PC based explanation apply.
2)Xenos might buffer up the entire untransformed scene description. This would remove repeated scene graph traversal from the equation but costs storage space for that buffer. Replaying such a display list can be made slightly more efficient than "talking to the driver" again from the application side.
3)Xenos might do the same as #2 but with transformed geometry. Costs for retransforming geometry are eliminated.
4)Xenos might do #3 and additionally sort the triangles in the display list to eliminate or at least reduce the amount of "useless for the current partition" triangles.

I don't know what's really going on with Xenos here, but either way this should explain why performance is going to be lost if you use AA at higher resolutions, even though from a different point of view it truly is free.

Joke:
5)Xenos might be a TBDR, and as such it would do #4 but get even more bang out of the work done during the sorting process.
(this is nonsense because if it were true Xenos wouldn't need 10MB of eDRAM -- it could make do with some kilobytes of on-chip tile storage)

RobHT
18-Oct-2005, 19:02
ERP, as always, thanks for the info.

. . .It's going to be a lot cheaper in almost any case than current PC architectures.

This just begs the question, why isn't AA on all X360 games when it can be so easily applied on PC games running similar resolutions? I guess the answer is that devs are opting to use available resources for other effects.
ERP,
Generally speaking, do you anticipate future X360 games will incorporate more AA?

ERP
18-Oct-2005, 19:14
Disclaimer: I base this solely on the public published information that's available for Xenos, most notably this B3D article (http://www.beyond3d.com/articles/xenos/index.php?p=05#tiled). I have no developer manual and certainly no dev kit.

In terms of computational resources, Xenos' AA really is absolutely and honestly free.
Per "pixel", Xenos computes one color, a depth gradient and determines a subpixel coverage mask. The maks is just four bits. The depth gradient is sufficient because all potentially covered subpixels, while they have variable depth values, are from the same triangle. Hence this connection doesn't need much bandwidth (actually less bandwidth than an equivalent PC part's road to memory because blending is also "free", even without any AA).

Inside the daughter die, the color and z values are replicated to all covered samples according to the mask bits, z test is done and blending is done. So in the worst case, for one incoming pixel the daughter die needs to read/modify/write four subpixel depth values (for a subpixel-precise depth test) and read/modify/write four subpixels' colors (for blending). The eDRAM daughter die has very high internal bandwidth and can cope with this all just fine, and this is exactly the reason why this is deemed "free".

The catch:
The eDRAM daughter die, while having that very high bandwidth, only has limited storage space. Rendering in high resolutions with AA will exhaust this space. Doing 4xMSAA requires four times as much space to be set aside than rendering without AA. If you don't have that space, you can't hold the whole backbuffer at once.

The proposed solution is to split the scene up into tiles that do fit the eDRAM space limits.
E.g. instead of rendering a complete 1280x720 w 4xMSAA frame (which you can't), the rendering process can be split up into three 1280x240 partitions (roughly ~9,8MB each) which are rendered sequentially. You flush out the finished partitions to system memory to make room for the next one. If you have all three partitions down in system memory the frame is done, you can point the RAMDAC there to scan it out and start building up the next frame in the same way.

But this is not free. Rendering the whole scene in one go is more efficient. Now let me base the explanation on a regular PC GPU (IMR) for the moment:
You'll have to resubmit and hence retransform at least some geometry. You should not need to render a triangle in partition 1 if you know it will only be visible in partition 2, but a triangle that overlaps two partitions will have to be rastered twice for correct results. "Knowing" is a problem though. For determining resubmission at the triangle level you'd need impractical amounts of (CPU) preprocessing, so you'll end up resubmitting huge gobs of geometry, if not your entire scene geometry, three times for three partitions. I.e. you do three times the vertex processing work and three times the trisetup work as opposed to non-partitioned rendering. As the submission isn't free either, you'll also pay three times the geometry bandwidth costs (shared system memory in case of Xenos) and three times the CPU costs associated with traversing the scene graph, setting up render states and queuing up the draw calls to the hardware. Overall, the only cost that doesn't triple is fillrate (because the partition has less pixels than a whole frame).

There are multiple conceivable ways how Xenos could assist this partitioned rendering process at the hardware level.
1)Xenos might well support nothing more than a reconfigurable viewport, i.e. nothing worth talking about. The same costs as set forth in the above PC based explanation apply.
2)Xenos might buffer up the entire untransformed scene description. This would remove repeated scene graph traversal from the equation but costs storage space for that buffer. Replaying such a display list can be made slightly more efficient than "talking to the driver" again from the application side.
3)Xenos might do the same as #2 but with transformed geometry. Costs for retransforming geometry are eliminated.
4)Xenos might do #3 and additionally sort the triangles in the display list to eliminate or at least reduce the amount of "useless for the current partition" triangles.

I don't know what's really going on with Xenos here, but either way this should explain why performance is going to be lost if you use AA at higher resolutions, even though from a different point of view it truly is free.

Joke:
5)Xenos might be a TBDR, and as such it would do #4 but get even more bang out of the work done during the sorting process.
(this is nonsense because if it were true Xenos wouldn't need 10MB of eDRAM -- it could make do with some kilobytes of on-chip tile storage)


It's public information how this works look for predicated tiling in Daves artical.

Basically though it can mark a primitive as interesting/not interesting during an initial Z rendering pass then simply jump over the prinmitive if it won't contribute to the scene.

ERP
18-Oct-2005, 19:16
ERP, as always, thanks for the info.



This just begs the question, why isn't AA on all X360 games when it can be so easily applied on PC games running similar resolutions? I guess the answer is that devs are opting to use available resources for other effects.
ERP,
Generally speaking, do you anticipate future X360 games will incorporate more AA?

I obviously can't comment on every game, I just don't have real information.

But on a PC you just turn it on, it's transparent, on Xenos you have to render your scene in a particular way to make it work.

Dr. Nick
18-Oct-2005, 19:51
I obviously can't comment on every game, I just don't have real information.

But on a PC you just turn it on, it's transparent, on Xenos you have to render your scene in a particular way to make it work.I don't think you guys can get a much better anwser than that.
Thank you ERP.

Nemo80
18-Oct-2005, 20:31
To keep it simple, the cost (seems to be) is very high, otherwise developers would use higher AA levels, or AA at all, which is hardly used right now. The reason might be the tiling performance loss, or a general run out of memory.

NavNucST3
18-Oct-2005, 20:38
I obviously can't comment on every game, I just don't have real information.

But on a PC you just turn it on, it's transparent, on Xenos you have to render your scene in a particular way to make it work.

ERP,

What is the coding "cost" on moving from alpha(9800pro)-->beta-->final(xenos); serious re-writes (weeks/months)? I know you have stated before that since there is no comparable PC part that most weren't willing to take a chance on the tiling. But, is there a chance that even the "just after launch window" titles would be willing to code specifically for the Xenos?

Megadrive1988
18-Oct-2005, 21:00
sorry to go slightly off-topic here (hardly) but I think that the original Xbox and Xbox games should've been set with 4X FSAA mandatory. Then Xbox2 / Xbox360 and games should've had the FSAA at either 4X or 8X with 4X being mandatory and 8X being optional.


2x or 4x FSAA is barely acceptable IMO, but I will take what I can get and have every intention of supporting Xbox360 (i also own an Xbox).

I know it comes down to cost of silicon, bandwidth and fillrate limitations, as far as how much FSAA we can have. just disappointed in the level of anti-aliasing on consoles. that doesnt mean Xbox360 games wont look great. I realize that with 720p and 4X FSAA, that is enough to eliminate most of the jagged edges, unless you look carefully.

Laa-Yosh
18-Oct-2005, 21:22
E.g. instead of rendering a complete 1280x720 w 4xMSAA frame (which you can't), the rendering process can be split up into three 1280x240 partitions (roughly ~9,8MB each) which are rendered sequentially.

Note: depending on the type of the game, you may wish to go with vertical tiles instead. That's because stuff like monsters in an FPS, high buildings, trees, etc. would usually get into all of the horizontal tiles, requiring them to be sent to the GPU 3 times. With vertical tiles, you have a better chance that something will only reside in one or two of the tiles instead, especially with a 16:9 aspect ratio.

Of course, with the player looking around, the efficiency could change just between a few frames as well...

Laa-Yosh
18-Oct-2005, 21:24
To keep it simple, the cost (seems to be) is very high, otherwise developers would use higher AA levels, or AA at all, which is hardly used right now. The reason might be the tiling performance loss, or a general run out of memory.

This is bulls**t, simply and seriously.

We already heard it several times: launch games are not using it because there wasn't enough time to test the implementation.

ERP
18-Oct-2005, 21:25
sorry to go slightly off-topic here (hardly) but I think that the original Xbox and Xbox games should've been set with 4X FSAA mandatory. Then Xbox2 / Xbox360 and games should've had the FSAA at either 4X or 8X with 4X being mandatory and 8X being optional.


2x or 4x FSAA is barely acceptable IMO, but I will take what I can get and have every intention of supporting Xbox360 (i also own an Xbox).

I know it comes down to cost of silicon, bandwidth and fillrate limitations, as far as how much FSAA we can have. just disappointed in the level of anti-aliasing on consoles. that doesnt mean Xbox360 games wont look great. I realize that with 720p and 4X FSAA, that is enough to eliminate most of the jagged edges, unless you look carefully.

This is obviously just my opinion but I'm on the exact opposite side of the fence......

Developers should be free to decide what features they support including resolutions, AA, amd what texture filtering. And that's not to say I wouldn't given the choice.

If a developer wants to ship a game at 160x100 with no AA and point sampled textures, he should be able to do that and have the market place decide if that's a good thing.

TRC's are there to protect the consumer to a point, ensuring consistent experience on things like memory cards, I'm not sure I agree with their extension to include things like AA and mandatory HD... If HD or AA become strong selling points developers will adopt them. The cool thing about consoles is watching what developers can eek out of the fixed resources, the more constraints you put in place the less eeking there is.

doob
18-Oct-2005, 21:30
Well... costs *exactly* around 450 bucks but u also get loads of other stuff...

AlStrong
18-Oct-2005, 22:18
I'm a little curious as to why their "temporal AA" was not included or featured. Is it a high cost in transistors? Supposing the developer was shooting for a constant 60fps, they could get a decent approximation of 4xMSAA with 2xTMSAA.

ERP
18-Oct-2005, 23:41
I'm a little curious as to why their "temporal AA" was not included or featured. Is it a high cost in transistors? Supposing the developer was shooting for a constant 60fps, they could get a decent approximation of 4xMSAA with 2xTMSAA.

Pretty much any dev could implement temporal AA on top of the library if they thought it would be worthwhile.
At a previous job we messed around with the idea on an Xbox game a long time before ATI "invented" the feature.
Our opinion was that at TV resolutions the artifacts were too irritating to make it worthwhile so we didn't proceed with it.

AlStrong
19-Oct-2005, 00:31
ah... interesting.

How would it have been done on the Xbox? Through shaders?

Inane_Dork
19-Oct-2005, 00:57
How would it have been done on the Xbox? Through shaders?A sub-pixel width camera jitter would do that, I think.


Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.php?p=6140903&postcount=71 It illuminated a few things for me.

ERP
19-Oct-2005, 01:03
ah... interesting.

How would it have been done on the Xbox? Through shaders?

Just changing the subpixel offset frame to frame.

Tap In
19-Oct-2005, 01:55
....

Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.php?p=6140903&postcount=71 It illuminated a few things for me.

nice find, thanks

Brimstone
19-Oct-2005, 02:22
I find almost impossible to believe that RARE, a first party of Microsoft which probably had the earliest possible developer kits with Xenos (Alpha), has failed to design their graphic engine for Perfect Dark Zero to take advantage of tiling. Now they may be struggling to get a bug free build with it working, but I'm confident by the time Perfect Dark Zero goes gold, they will have tiling up and running.

AlStrong
19-Oct-2005, 02:37
Thanks again, ERP.


Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.php?p=6140903&postcount=71 It illuminated a few things for me.

The Unreal3 engine, for example, was not engineered to use predicated tiling. You very well might not see any FSAA in Gears of War, or other 360 titles that use it, unless Epic works to incorperate predicated tiling efficiently. I hope they do.

I hope so too...otherwise, that's a kick in the balls. :mad:


I find almost impossible to believe that RARE, a first party of Microsoft which probably had the earliest possible developer kits with Xenos (Alpha), has failed to design their graphic engine for Perfect Dark Zero to take advantage of tiling. Now they may be struggling to get a bug free build with it working, but I'm confident by the time Perfect Dark Zero goes gold, they will have tiling up and running.


I wonder if the lead programmer and others are sick of learning new architectures every year (first GCN, then Xbox, then Alpha kits, then Beta Kits). :wink:

scooby_dooby
19-Oct-2005, 03:20
Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.php?p=6140903&postcount=71 It illuminated a few things for me.

well that about settles that!


To preface, anyone who's mentioned the FSAA is virtually 'free' when using predicated tiling is absolutely correct. MS has pushed developers to use predicated tiling for their titles, because if implimented early on and planned for - rendering several different tiles can actually improve performance overall (in some cases), and in all others would either virtually uneffect performance, or only include a 1-3% perf hit... but again, that's assuming the engine was built with tiling rendering in place.

Adding predicated tiling to any 3d graphics engine is pretty trivial, but when it's not architechted from the ground up to incorporate predicated tiling, the perf hit can be anywhere from 1-10%, which isn't terrible - but it also can be very noticable.

The Unreal3 engine, for example, was not engineered to use predicated tiling. You very well might not see any FSAA in Gears of War, or other 360 titles that use it, unless Epic works to incorperate predicated tiling efficiently. I hope they do.

In any case, it is a requirement that Xbox 360 games do not have blatant or obvious 'aliasing' factors. Unlike what 'The Gamemaster' said above, it is NOT required to have 2xFSAA. That is the recommended solution games use to get them through certification, but that's not a requirement. The requirement goes on to indicate that games can use motion blur, depth of field, and other effects that if (cooperatively) eliminate 'jaggies' from the game are acceptible.

dukmahsik
19-Oct-2005, 03:25
i actually like some form of baseline standards like such

Carl B
19-Oct-2005, 03:55
To preface, anyone who's mentioned the FSAA is virtually 'free' when using predicated tiling is absolutely correct. MS has pushed developers to use predicated tiling for their titles, because if implimented early on and planned for - rendering several different tiles can actually improve performance overall (in some cases), and in all others would either virtually uneffect performance, or only include a 1-3% perf hit... but again, that's assuming the engine was built with tiling rendering in place.

Adding predicated tiling to any 3d graphics engine is pretty trivial, but when it's not architechted from the ground up to incorporate predicated tiling, the perf hit can be anywhere from 1-10%, which isn't terrible - but it also can be very noticable.

DarkFalz's post on the matter assists in clearing the air on the issue - a definite plus that he posted on the matter. But it still represents some backpeddling from the original stated goals. (http://www.bit-tech.net/news/2005/05/25/ati_xbox_360_london/)

Now, I think it's good that MS is being more flexible with what devs are allowed to do in terms of effects and resource useage, but phrases like "adding predicated tiling to any 3d graphics engine is pretty trivial," still stand out at me.

It's been known for some time - or at least generally agreed - that Xenos is capable of 'virtually' free AA on a hardware level. But in the end, the question remains what will it require in terms of effort for devs to take advantage of this, and what will the ramp-up time be in terms of months/years before we start seeing it flex it's muscles in terms of game development? Will it be something that only shows up in AAA titles? Will it be absent from cross-console games? Or will it be truly ubiquitous come a certain point?

pakpassion
19-Oct-2005, 04:49
this topic has been discussed to the max. in the last generation splinter cell 1 had alot of Aliasing compared to splinter cell 2 which had more than splinter cell 3 on xbox. as developers get to know the hardware the aliasing will diminish further.

Further we have to see that when a game is applying HDR, alot of bandwidth is used and is used to such an extent that the AA has to be lowered. Kameo uses 2xAA while using HDR, PDZ aparently utilises HDR in some aspects but has turned off AA which is surprising because Kameo has more effects than PDZ. Maybe there is some coding ineffeciencies.

Another example we have to see is Project Gotham Racing 3. It is applying AA and HDR at the same time and I believe its because of direct3d compression technique which was discussed a few months ago in the Chip conference where Cell and Xbox 360 Chip pictures were shown for the first time. I remember reading that if the developer did Direct3d compression, the bandwidth between the memory and the GPU would essentially be doubled. not that it WOULD be doubled but the compression works in such a way that if the bandwidth of 22.4 was doubled. it would account for what is carried from the GPU to the Memory when the compression is applied. I believe with that technique. project gotham racing is utilising full HDR and full AA at the same time. Alot of other games are applying full AA . Top Spin 2 is an example. Chrome hounds with HDR is another example. Call of Duty 2 has Bloom effects and Full AA in the final videos.

It depends on the developer. topic closed.

aaaaa00
19-Oct-2005, 08:45
I find almost impossible to believe that RARE, a first party of Microsoft which probably had the earliest possible developer kits with Xenos (Alpha), has failed to design their graphic engine for Perfect Dark Zero to take advantage of tiling. Now they may be struggling to get a bug free build with it working, but I'm confident by the time Perfect Dark Zero goes gold, they will have tiling up and running.

Tiling wasn't implemented until the Beta kits. Before that there was a software emulation, but it wasn't really usable for anything but experimenting and debugging.

scatteh316
19-Oct-2005, 08:52
It cost's 1,00000999,0000000,00000000% of performance. :P

scatteh316
19-Oct-2005, 08:54
I believe its because of direct3d compression technique which was discussed a few months ago in the Chip conference where Cell and Xbox 360 Chip pictures were shown for the first time. I remember reading that if the developer did Direct3d compression, the bandwidth between the memory and the GPU would essentially be doubled. not that it WOULD be doubled but the compression works in such a way that if the bandwidth of 22.4 was doubled.

Could PS3 use a simular technique to maybe double the bandwidth between Cell and RSX???

one
19-Oct-2005, 09:17
Tiling wasn't implemented until the Beta kits. Before that there was a software emulation, but it wasn't really usable for anything but experimenting and debugging.Why couldn't they get some engineering samples from ATi? They said 720p and AA are must in public, didn't they?

Brimstone
19-Oct-2005, 09:32
Tiling wasn't implemented until the Beta kits. Before that there was a software emulation, but it wasn't really usable for anything but experimenting and debugging.

I'm guessing a few programmers at RARE should be sort of familar with the concept since tiling was implemented on the Gamecube correct? Dave's Xenos article mentions that Flipper does share some concepts as Flipper, though Xenos was designed by a different set of engineers.


If Perfect Dark Zero shipped without "tiling", how big of a patch to download would it be to change the engine into a tiler?

pipo
19-Oct-2005, 09:37
I'd be very surprised if they would even consider such a thing. They'd better delay the game until xmas then.

Shifty Geezer
19-Oct-2005, 10:49
In terms of computational resources, Xenos' AA really is absolutely and honestly free.
Per "pixel", Xenos computes one color, a depth gradient and determines a subpixel coverage mask. The maks is just four bits. The depth gradient is sufficient because all potentially covered subpixels, while they have variable depth values, are from the same triangle. Hence this connection doesn't need much bandwidth (actually less bandwidth than an equivalent PC part's road to memory because blending is also "free", even without any AA).Many thanks for this post. So ignoring tiling which applies whenever the framebuffer requirements exceed eDRAM capacity, and that can happen with or without AA, there is no extra Vertex pass or the like needed for AA. There is like the ordinary no-AA rendering a single sample per pixel rendered. The amount of vertex shader and pixel shader instructions executed is the same regardless of whether AA is on or off.

Is that right?

Dave Baumann
19-Oct-2005, 11:08
How Xenos's AA works is already in the article and I've reiterated it here (http://www.beyond3d.com/forum/showpost.php?p=583639&postcount=103).

Shifty Geezer
19-Oct-2005, 11:27
Thanks for the link Dave. That was posted recently here somewhere too :oops: So in terms of what's needed on top of rendering without AA, the answer is next to nothing other than the tiling costs. Case closed.

RobHT
19-Oct-2005, 13:28
To preface, anyone who's mentioned the FSAA is virtually 'free' when using predicated tiling is absolutely correct. MS has pushed developers to use predicated tiling for their titles, because if implimented early on and planned for - rendering several different tiles can actually improve performance overall (in some cases), and in all others would either virtually uneffect performance, or only include a 1-3% perf hit... but again, that's assuming the engine was built with tiling rendering in place.

Adding predicated tiling to any 3d graphics engine is pretty trivial, but when it's not architechted from the ground up to incorporate predicated tiling, the perf hit can be anywhere from 1-10%, which isn't terrible - but it also can be very noticable.

The Unreal3 engine, for example, was not engineered to use predicated tiling. You very well might not see any FSAA in Gears of War, or other 360 titles that use it, unless Epic works to incorperate predicated tiling efficiently. I hope they do.

In any case, it is a requirement that Xbox 360 games do not have blatant or obvious 'aliasing' factors. Unlike what 'The Gamemaster' said above, it is NOT required to have 2xFSAA. That is the recommended solution games use to get them through certification, but that's not a requirement. The requirement goes on to indicate that games can use motion blur, depth of field, and other effects that if (cooperatively) eliminate 'jaggies' from the game are acceptible.

Nice find. Great stuff!
I guess we can infer from this that while building a new game with predicated tiling in mind is 'trivial', retrofitting to a game further in development is non-trivial.
If UE3 games all lack AA, well that's a bummer.

from pakpassion:
Further we have to see that when a game is applying HDR, alot of bandwidth is used and is used to such an extent that the AA has to be lowered. Kameo uses 2xAA while using HDR, PDZ aparently utilises HDR in some aspects but has turned off AA which is surprising because Kameo has more effects than PDZ. Maybe there is some coding ineffeciencies.

Another example we have to see is Project Gotham Racing 3. It is applying AA and HDR at the same time and I believe its because of direct3d compression technique which was discussed a few months ago in the Chip conference where Cell and Xbox 360 Chip pictures were shown for the first time. I remember reading that if the developer did Direct3d compression, the bandwidth between the memory and the GPU would essentially be doubled. not that it WOULD be doubled but the compression works in such a way that if the bandwidth of 22.4 was doubled. it would account for what is carried from the GPU to the Memory when the compression is applied. I believe with that technique. project gotham racing is utilising full HDR and full AA at the same time. Alot of other games are applying full AA . Top Spin 2 is an example. Chrome hounds with HDR is another example. Call of Duty 2 has Bloom effects and Full AA in the final videos.
packpassion,
Where did you get this info? Not doubting you, just curious.
If true, this truly closes the book on this issue, in my mind. It would mean many/most of these early titles are already incorporating some form of AA. PDZ certainly deserves a pass on this one, considering all the iterations it has been through.

Carl B
19-Oct-2005, 13:29
How Xenos's AA works is already in the article and I've reiterated it here.

Well the primary question though isn't so much whether or not the AA is 'free' in hardware per se, since we know it more or less is; rather what is the performance hit associated with implementing it. Now, since that question has already more or less been answered in this thread, I'm not going to harp on it - but just want to make sure that the purpose of this thread remains differentiated from the usual 'How does Xenos work' threads. :)

Previous threads and answers have dealt mainly with the hardware implementation - this thread seems focused on the estimated performance penalties on the holistic level.

Crazyace
21-Oct-2005, 19:02
If you had to tile your buffers - and your geometry get's very complex this is going to have an effect..

Say for 100MVert/s ( Maybe 200 MTriangles.. ) with xyz(fp16), rgba(8) and ST(fp16) you need a max B/W of 1.4GB/s from ram to feed the triangle set up engine.

For procedual data generated from the CPU this may come directly from the L2 core when tiling isn't enabled... but with tiling the CPU would have to write 1.4GB -> memory ( rather than directly to the gpu ), and the gpu read that 1.4GB every pass.. There is bound to be an effect ;)

It's an extreme case.. but show's that some extra planning may be required from game programmers to reduce system load..

Quaz51
21-Oct-2005, 19:17
the Z-pass tag (for tiling) each polygone?
that must use a little bandwidth for write tag for each poly into RAM?

Dave Baumann
21-Oct-2005, 19:30
No, the Z pass tags a command - this will usually be many polygons.

Quaz51
21-Oct-2005, 19:44
No, the Z pass tags a command - this will usually be many polygons.

ok but if each individual poly is not tagged then they will are more poly consider into several tile (and tranformed every time) and if the tag is by "objet/coherent-group/primitive" the use of small poly in the next-gen game will not reduce the relative redundancy of geometry process

ERP
21-Oct-2005, 19:52
ok but if each individual poly is not tagged then they will are more poly consider into several tile (and tranformed every time) and if the tag is by "objet/coherent-group/primitive" the use of small poly in the next-gen will not reduce the relative redundancy of geometry process

All depends on the average size of your objects/command blocks.

Yes as I've mentioned before there will be some duplicate vertex work, and frankly minimising this is really what requires you to plan for tiling in the first place. You know what's going to happen so you need to strcture your data to minmimise the impact.
There are tradeoffs to bne made in batch size vs screen coverage and vertex shader complexity. Tradeoffs you really need to benchmark to make good decisions.

Doesn't take a genious to do this, but you pretty much have to plan for it.