Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Old 18-Oct-2005, 17:49   #1
Shifty Geezer
uber-Troll!
 
Join Date: Dec 2004
Location: Under my bridge
Posts: 29,060
Default What *exactly* is the cost of Xenos AA?

In the console games section this debate is back, and it seems people are still unclear exactly how much the 'free' 2xAA of Xenos costs. Afterall it's called free, so surely it comes at no extra cost whatsoever to the render pipeline? The GPU can render the same scene with 2xAA without expending any extra effort or taking any longer than without AA? That's not actually the case, is it? As I understand it the pixel shaders are applied once per pixel, but there is 2 vertex samples per pixel, plus a 5% or whatever it is tiling overlap. The actual sample averaging can be considered 'free' as that's handled by the eDRAM logic. So the actual cost of 2xMSAA on Xenos over normal no AA rendering is twice the Vertex transform work + '5%' for the tiling as an AA'd buffer dosen't fit neatly in eDRAM (FP10 HDR being considered as the color data type). The term 'free' is only true in the sense of bandwidth consumption. On a typical GPU the same AA processes are involved, but the hit on BW due to blending is where the price is really paid and where the eDRAM provides it's unique benefit. Right?

Can some really knowledgable person post up a definitive coverage of what's involved in simple terms that people can hear, understand, and not forget? I'm ashamed that this far after hearing all about Xenos I'm still hazy as to what 'free' AA really means, and I don't think I'm alone here
__________________
Shifty Geezer
...

Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents.
Shifty Geezer is offline  
Old 18-Oct-2005, 18:18   #2
Xenus
Senior Member
 
Join Date: Nov 2004
Location: Ohio
Posts: 1,316
Default

Wouldn't this mean if a game became vertex transform limited Xenos's AA would no longer be able to be used. Though with its USA I don't see this happening much.
Xenus is offline  
Old 18-Oct-2005, 18:35   #3
lip2lip
 
Join Date: May 2005
Posts: 76
Default

IMHO, I would consider cost to xbox 360 in this issue as a measure in ns, and not chip computation costs, mostly because most work is done off chip, and somewhat because cycle counting is mostly used for multicore environments, as any program can tell you.

I believe total latency should be low, because of low latency nature of embedded mem.

All hail 360!
lip2lip is offline  
Old 18-Oct-2005, 18:48   #4
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,647
Default

There have been so many posts on this and several actual answers.

There is no way to generalise X% overhead.

It requires you use tiling for HD resolutions, this in turn requires that the entire display list is available to the Graphics Chip which may or may not have significant memory implications depending on how you do it.

Any batch of primitives that straddles multiple tiles will be transformed for every tile it's in, although pixel shading cost will be no extra, and there will be one extra sync of the graphics pipe for every tile rendered.

You can likely construct artificial cases that are close to 0 overhead and artificial cases that have large overhead, the actual penalty is going to be very game dependant. It's going to be a lot cheaper in almost any case than current PC architectures.
ERP is offline  
Old 18-Oct-2005, 19:00   #5
Rolf N
Recurring Membmare
 
Join Date: Aug 2003
Location: yes
Posts: 2,494
Default

Disclaimer: I base this solely on the public published information that's available for Xenos, most notably this B3D article. I have no developer manual and certainly no dev kit.

In terms of computational resources, Xenos' AA really is absolutely and honestly free.
Per "pixel", Xenos computes one color, a depth gradient and determines a subpixel coverage mask. The maks is just four bits. The depth gradient is sufficient because all potentially covered subpixels, while they have variable depth values, are from the same triangle. Hence this connection doesn't need much bandwidth (actually less bandwidth than an equivalent PC part's road to memory because blending is also "free", even without any AA).

Inside the daughter die, the color and z values are replicated to all covered samples according to the mask bits, z test is done and blending is done. So in the worst case, for one incoming pixel the daughter die needs to read/modify/write four subpixel depth values (for a subpixel-precise depth test) and read/modify/write four subpixels' colors (for blending). The eDRAM daughter die has very high internal bandwidth and can cope with this all just fine, and this is exactly the reason why this is deemed "free".

The catch:
The eDRAM daughter die, while having that very high bandwidth, only has limited storage space. Rendering in high resolutions with AA will exhaust this space. Doing 4xMSAA requires four times as much space to be set aside than rendering without AA. If you don't have that space, you can't hold the whole backbuffer at once.

The proposed solution is to split the scene up into tiles that do fit the eDRAM space limits.
E.g. instead of rendering a complete 1280x720 w 4xMSAA frame (which you can't), the rendering process can be split up into three 1280x240 partitions (roughly ~9,8MB each) which are rendered sequentially. You flush out the finished partitions to system memory to make room for the next one. If you have all three partitions down in system memory the frame is done, you can point the RAMDAC there to scan it out and start building up the next frame in the same way.

But this is not free. Rendering the whole scene in one go is more efficient. Now let me base the explanation on a regular PC GPU (IMR) for the moment:
You'll have to resubmit and hence retransform at least some geometry. You should not need to render a triangle in partition 1 if you know it will only be visible in partition 2, but a triangle that overlaps two partitions will have to be rastered twice for correct results. "Knowing" is a problem though. For determining resubmission at the triangle level you'd need impractical amounts of (CPU) preprocessing, so you'll end up resubmitting huge gobs of geometry, if not your entire scene geometry, three times for three partitions. I.e. you do three times the vertex processing work and three times the trisetup work as opposed to non-partitioned rendering. As the submission isn't free either, you'll also pay three times the geometry bandwidth costs (shared system memory in case of Xenos) and three times the CPU costs associated with traversing the scene graph, setting up render states and queuing up the draw calls to the hardware. Overall, the only cost that doesn't triple is fillrate (because the partition has less pixels than a whole frame).

There are multiple conceivable ways how Xenos could assist this partitioned rendering process at the hardware level.
1)Xenos might well support nothing more than a reconfigurable viewport, i.e. nothing worth talking about. The same costs as set forth in the above PC based explanation apply.
2)Xenos might buffer up the entire untransformed scene description. This would remove repeated scene graph traversal from the equation but costs storage space for that buffer. Replaying such a display list can be made slightly more efficient than "talking to the driver" again from the application side.
3)Xenos might do the same as #2 but with transformed geometry. Costs for retransforming geometry are eliminated.
4)Xenos might do #3 and additionally sort the triangles in the display list to eliminate or at least reduce the amount of "useless for the current partition" triangles.

I don't know what's really going on with Xenos here, but either way this should explain why performance is going to be lost if you use AA at higher resolutions, even though from a different point of view it truly is free.

Joke:
5)Xenos might be a TBDR, and as such it would do #4 but get even more bang out of the work done during the sorting process.
(this is nonsense because if it were true Xenos wouldn't need 10MB of eDRAM -- it could make do with some kilobytes of on-chip tile storage)
Rolf N is offline  
Old 18-Oct-2005, 19:02   #6
RobHT
Junior Member
 
Join Date: Sep 2005
Posts: 40
Default

ERP, as always, thanks for the info.

Quote:
. . .It's going to be a lot cheaper in almost any case than current PC architectures.
This just begs the question, why isn't AA on all X360 games when it can be so easily applied on PC games running similar resolutions? I guess the answer is that devs are opting to use available resources for other effects.
ERP,
Generally speaking, do you anticipate future X360 games will incorporate more AA?
RobHT is offline  
Old 18-Oct-2005, 19:14   #7
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,647
Default

Quote:
Originally Posted by zeckensack
Disclaimer: I base this solely on the public published information that's available for Xenos, most notably this B3D article. I have no developer manual and certainly no dev kit.

In terms of computational resources, Xenos' AA really is absolutely and honestly free.
Per "pixel", Xenos computes one color, a depth gradient and determines a subpixel coverage mask. The maks is just four bits. The depth gradient is sufficient because all potentially covered subpixels, while they have variable depth values, are from the same triangle. Hence this connection doesn't need much bandwidth (actually less bandwidth than an equivalent PC part's road to memory because blending is also "free", even without any AA).

Inside the daughter die, the color and z values are replicated to all covered samples according to the mask bits, z test is done and blending is done. So in the worst case, for one incoming pixel the daughter die needs to read/modify/write four subpixel depth values (for a subpixel-precise depth test) and read/modify/write four subpixels' colors (for blending). The eDRAM daughter die has very high internal bandwidth and can cope with this all just fine, and this is exactly the reason why this is deemed "free".

The catch:
The eDRAM daughter die, while having that very high bandwidth, only has limited storage space. Rendering in high resolutions with AA will exhaust this space. Doing 4xMSAA requires four times as much space to be set aside than rendering without AA. If you don't have that space, you can't hold the whole backbuffer at once.

The proposed solution is to split the scene up into tiles that do fit the eDRAM space limits.
E.g. instead of rendering a complete 1280x720 w 4xMSAA frame (which you can't), the rendering process can be split up into three 1280x240 partitions (roughly ~9,8MB each) which are rendered sequentially. You flush out the finished partitions to system memory to make room for the next one. If you have all three partitions down in system memory the frame is done, you can point the RAMDAC there to scan it out and start building up the next frame in the same way.

But this is not free. Rendering the whole scene in one go is more efficient. Now let me base the explanation on a regular PC GPU (IMR) for the moment:
You'll have to resubmit and hence retransform at least some geometry. You should not need to render a triangle in partition 1 if you know it will only be visible in partition 2, but a triangle that overlaps two partitions will have to be rastered twice for correct results. "Knowing" is a problem though. For determining resubmission at the triangle level you'd need impractical amounts of (CPU) preprocessing, so you'll end up resubmitting huge gobs of geometry, if not your entire scene geometry, three times for three partitions. I.e. you do three times the vertex processing work and three times the trisetup work as opposed to non-partitioned rendering. As the submission isn't free either, you'll also pay three times the geometry bandwidth costs (shared system memory in case of Xenos) and three times the CPU costs associated with traversing the scene graph, setting up render states and queuing up the draw calls to the hardware. Overall, the only cost that doesn't triple is fillrate (because the partition has less pixels than a whole frame).

There are multiple conceivable ways how Xenos could assist this partitioned rendering process at the hardware level.
1)Xenos might well support nothing more than a reconfigurable viewport, i.e. nothing worth talking about. The same costs as set forth in the above PC based explanation apply.
2)Xenos might buffer up the entire untransformed scene description. This would remove repeated scene graph traversal from the equation but costs storage space for that buffer. Replaying such a display list can be made slightly more efficient than "talking to the driver" again from the application side.
3)Xenos might do the same as #2 but with transformed geometry. Costs for retransforming geometry are eliminated.
4)Xenos might do #3 and additionally sort the triangles in the display list to eliminate or at least reduce the amount of "useless for the current partition" triangles.

I don't know what's really going on with Xenos here, but either way this should explain why performance is going to be lost if you use AA at higher resolutions, even though from a different point of view it truly is free.

Joke:
5)Xenos might be a TBDR, and as such it would do #4 but get even more bang out of the work done during the sorting process.
(this is nonsense because if it were true Xenos wouldn't need 10MB of eDRAM -- it could make do with some kilobytes of on-chip tile storage)

It's public information how this works look for predicated tiling in Daves artical.

Basically though it can mark a primitive as interesting/not interesting during an initial Z rendering pass then simply jump over the prinmitive if it won't contribute to the scene.
ERP is offline  
Old 18-Oct-2005, 19:16   #8
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,647
Default

Quote:
Originally Posted by RobHT
ERP, as always, thanks for the info.



This just begs the question, why isn't AA on all X360 games when it can be so easily applied on PC games running similar resolutions? I guess the answer is that devs are opting to use available resources for other effects.
ERP,
Generally speaking, do you anticipate future X360 games will incorporate more AA?
I obviously can't comment on every game, I just don't have real information.

But on a PC you just turn it on, it's transparent, on Xenos you have to render your scene in a particular way to make it work.
ERP is offline  
Old 18-Oct-2005, 19:51   #9
Dr. Nick
Member
 
Join Date: Jul 2005
Posts: 676
Default

Quote:
Originally Posted by ERP
I obviously can't comment on every game, I just don't have real information.

But on a PC you just turn it on, it's transparent, on Xenos you have to render your scene in a particular way to make it work.
I don't think you guys can get a much better anwser than that.
Thank you ERP.
Dr. Nick is offline  
Old 18-Oct-2005, 20:31   #10
Nemo80
Naughty Boy!
 
Join Date: Sep 2005
Posts: 128
Default

To keep it simple, the cost (seems to be) is very high, otherwise developers would use higher AA levels, or AA at all, which is hardly used right now. The reason might be the tiling performance loss, or a general run out of memory.
Nemo80 is offline  
Old 18-Oct-2005, 20:38   #11
NavNucST3
Senior Member
 
Join Date: Jun 2005
Location: Chicago, IL
Posts: 1,590
Default

Quote:
Originally Posted by ERP
I obviously can't comment on every game, I just don't have real information.

But on a PC you just turn it on, it's transparent, on Xenos you have to render your scene in a particular way to make it work.
ERP,

What is the coding "cost" on moving from alpha(9800pro)-->beta-->final(xenos); serious re-writes (weeks/months)? I know you have stated before that since there is no comparable PC part that most weren't willing to take a chance on the tiling. But, is there a chance that even the "just after launch window" titles would be willing to code specifically for the Xenos?
NavNucST3 is offline  
Old 18-Oct-2005, 21:00   #12
Megadrive1988
Senior Member
 
Join Date: May 2002
Posts: 4,354
Default

sorry to go slightly off-topic here (hardly) but I think that the original Xbox and Xbox games should've been set with 4X FSAA mandatory. Then Xbox2 / Xbox360 and games should've had the FSAA at either 4X or 8X with 4X being mandatory and 8X being optional.


2x or 4x FSAA is barely acceptable IMO, but I will take what I can get and have every intention of supporting Xbox360 (i also own an Xbox).

I know it comes down to cost of silicon, bandwidth and fillrate limitations, as far as how much FSAA we can have. just disappointed in the level of anti-aliasing on consoles. that doesnt mean Xbox360 games wont look great. I realize that with 720p and 4X FSAA, that is enough to eliminate most of the jagged edges, unless you look carefully.
Megadrive1988 is offline  
Old 18-Oct-2005, 21:22   #13
Laa-Yosh
member
 
Join Date: Feb 2002
Posts: 8,157
Default

Quote:
Originally Posted by zeckensack
E.g. instead of rendering a complete 1280x720 w 4xMSAA frame (which you can't), the rendering process can be split up into three 1280x240 partitions (roughly ~9,8MB each) which are rendered sequentially.
Note: depending on the type of the game, you may wish to go with vertical tiles instead. That's because stuff like monsters in an FPS, high buildings, trees, etc. would usually get into all of the horizontal tiles, requiring them to be sent to the GPU 3 times. With vertical tiles, you have a better chance that something will only reside in one or two of the tiles instead, especially with a 16:9 aspect ratio.

Of course, with the player looking around, the efficiency could change just between a few frames as well...
Laa-Yosh is offline  
Old 18-Oct-2005, 21:24   #14
Laa-Yosh
member
 
Join Date: Feb 2002
Posts: 8,157
Default

Quote:
Originally Posted by Nemo80
To keep it simple, the cost (seems to be) is very high, otherwise developers would use higher AA levels, or AA at all, which is hardly used right now. The reason might be the tiling performance loss, or a general run out of memory.
This is bulls**t, simply and seriously.

We already heard it several times: launch games are not using it because there wasn't enough time to test the implementation.
Laa-Yosh is offline  
Old 18-Oct-2005, 21:25   #15
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,647
Default

Quote:
Originally Posted by Megadrive1988
sorry to go slightly off-topic here (hardly) but I think that the original Xbox and Xbox games should've been set with 4X FSAA mandatory. Then Xbox2 / Xbox360 and games should've had the FSAA at either 4X or 8X with 4X being mandatory and 8X being optional.


2x or 4x FSAA is barely acceptable IMO, but I will take what I can get and have every intention of supporting Xbox360 (i also own an Xbox).

I know it comes down to cost of silicon, bandwidth and fillrate limitations, as far as how much FSAA we can have. just disappointed in the level of anti-aliasing on consoles. that doesnt mean Xbox360 games wont look great. I realize that with 720p and 4X FSAA, that is enough to eliminate most of the jagged edges, unless you look carefully.
This is obviously just my opinion but I'm on the exact opposite side of the fence......

Developers should be free to decide what features they support including resolutions, AA, amd what texture filtering. And that's not to say I wouldn't given the choice.

If a developer wants to ship a game at 160x100 with no AA and point sampled textures, he should be able to do that and have the market place decide if that's a good thing.

TRC's are there to protect the consumer to a point, ensuring consistent experience on things like memory cards, I'm not sure I agree with their extension to include things like AA and mandatory HD... If HD or AA become strong selling points developers will adopt them. The cool thing about consoles is watching what developers can eek out of the fixed resources, the more constraints you put in place the less eeking there is.
ERP is offline  
Old 18-Oct-2005, 21:30   #16
doob
Member
 
Join Date: May 2005
Posts: 380
Default

Well... costs *exactly* around 450 bucks but u also get loads of other stuff...
doob is offline  
Old 18-Oct-2005, 22:18   #17
AlNets
A bit netty
 
Join Date: Feb 2004
Location: warp
Posts: 14,801
Default

I'm a little curious as to why their "temporal AA" was not included or featured. Is it a high cost in transistors? Supposing the developer was shooting for a constant 60fps, they could get a decent approximation of 4xMSAA with 2xTMSAA.
AlNets is offline  
Old 18-Oct-2005, 23:41   #18
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,647
Default

Quote:
Originally Posted by Alstrong
I'm a little curious as to why their "temporal AA" was not included or featured. Is it a high cost in transistors? Supposing the developer was shooting for a constant 60fps, they could get a decent approximation of 4xMSAA with 2xTMSAA.
Pretty much any dev could implement temporal AA on top of the library if they thought it would be worthwhile.
At a previous job we messed around with the idea on an Xbox game a long time before ATI "invented" the feature.
Our opinion was that at TV resolutions the artifacts were too irritating to make it worthwhile so we didn't proceed with it.
ERP is offline  
Old 19-Oct-2005, 00:31   #19
AlNets
A bit netty
 
Join Date: Feb 2004
Location: warp
Posts: 14,801
Default

ah... interesting.

How would it have been done on the Xbox? Through shaders?
AlNets is offline  
Old 19-Oct-2005, 00:57   #20
Inane_Dork
Rebmem Roines
 
Join Date: Sep 2004
Posts: 1,987
Default

Quote:
Originally Posted by Alstrong
How would it have been done on the Xbox? Through shaders?
A sub-pixel width camera jitter would do that, I think.


Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.p...3&postcount=71 It illuminated a few things for me.
Inane_Dork is offline  
Old 19-Oct-2005, 01:03   #21
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,647
Default

Quote:
Originally Posted by Alstrong
ah... interesting.

How would it have been done on the Xbox? Through shaders?
Just changing the subpixel offset frame to frame.
ERP is offline  
Old 19-Oct-2005, 01:55   #22
Tap In
Regular
 
Join Date: Jun 2005
Location: Gravity Always Wins
Posts: 6,370
Default

Quote:
Originally Posted by Inane_Dork
....

Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.p...3&postcount=71 It illuminated a few things for me.
nice find, thanks
Tap In is offline  
Old 19-Oct-2005, 02:22   #23
Brimstone
B3D Shockwave Rider
 
Join Date: Feb 2002
Posts: 1,835
Default

I find almost impossible to believe that RARE, a first party of Microsoft which probably had the earliest possible developer kits with Xenos (Alpha), has failed to design their graphic engine for Perfect Dark Zero to take advantage of tiling. Now they may be struggling to get a bug free build with it working, but I'm confident by the time Perfect Dark Zero goes gold, they will have tiling up and running.
Brimstone is offline  
Old 19-Oct-2005, 02:37   #24
AlNets
A bit netty
 
Join Date: Feb 2004
Location: warp
Posts: 14,801
Default

Thanks again, ERP.

Quote:
Originally Posted by Inane_Dork
Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.p...3&postcount=71 It illuminated a few things for me.
Quote:
The Unreal3 engine, for example, was not engineered to use predicated tiling. You very well might not see any FSAA in Gears of War, or other 360 titles that use it, unless Epic works to incorperate predicated tiling efficiently. I hope they do.
I hope so too...otherwise, that's a kick in the balls.


Quote:
Originally Posted by Brimstone
I find almost impossible to believe that RARE, a first party of Microsoft which probably had the earliest possible developer kits with Xenos (Alpha), has failed to design their graphic engine for Perfect Dark Zero to take advantage of tiling. Now they may be struggling to get a bug free build with it working, but I'm confident by the time Perfect Dark Zero goes gold, they will have tiling up and running.

I wonder if the lead programmer and others are sick of learning new architectures every year (first GCN, then Xbox, then Alpha kits, then Beta Kits).
AlNets is offline  
Old 19-Oct-2005, 03:20   #25
scooby_dooby
Regular
 
Join Date: May 2005
Location: E-town, Alberta
Posts: 8,506
Default

Quote:
Originally Posted by Inane_Dork

Anyway, an MS employee responded to this issue here: http://forum.teamxbox.com/showpost.p...3&postcount=71 It illuminated a few things for me.
well that about settles that!

Quote:
To preface, anyone who's mentioned the FSAA is virtually 'free' when using predicated tiling is absolutely correct. MS has pushed developers to use predicated tiling for their titles, because if implimented early on and planned for - rendering several different tiles can actually improve performance overall (in some cases), and in all others would either virtually uneffect performance, or only include a 1-3% perf hit... but again, that's assuming the engine was built with tiling rendering in place.

Adding predicated tiling to any 3d graphics engine is pretty trivial, but when it's not architechted from the ground up to incorporate predicated tiling, the perf hit can be anywhere from 1-10%, which isn't terrible - but it also can be very noticable.

The Unreal3 engine, for example, was not engineered to use predicated tiling. You very well might not see any FSAA in Gears of War, or other 360 titles that use it, unless Epic works to incorperate predicated tiling efficiently. I hope they do.

In any case, it is a requirement that Xbox 360 games do not have blatant or obvious 'aliasing' factors. Unlike what 'The Gamemaster' said above, it is NOT required to have 2xFSAA. That is the recommended solution games use to get them through certification, but that's not a requirement. The requirement goes on to indicate that games can use motion blur, depth of field, and other effects that if (cooperatively) eliminate 'jaggies' from the game are acceptible.
scooby_dooby is offline  

 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:38.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.