Tiling on the Xbox 360: Status and Should Future Consoles Use eDRAM?

Acert93

Artist formerly known as Acert93
Legend
Thread prompted by Mintmaster in another thread:

One thing I've wondered is why devs don't implement tiling more universally. Am I mistaken in thinking that nearly all games implement bounding volumes and object level frustum culling? It seems so ridiculously easy to simply use a narrower frustum for each tile and traverse the scene graph multiple times.

So, devs, sound off. What is the state of tiling? More and more titles are using it, but some big games aren't. Why? What do you have to say about performance? Is it technically difficult beyond a surface examination? Are the APIs up to snuff?

I believe Joker's game is using 4xMSAA with a lot of geometry. NBA Street offers 1080p with MSAA, as does VT3 I believe. Sports games seem to have gotten it down.

Bonus points for developers giving feedback on embedded memory on the next consoles. Should they go the tiling route? How much is necessary? 64MB? 128MB? How much percision will be needed for FP blending? Will the approach used by KZ2 become popular... and not be very compatible with embedded memory?
 
it should also interesting to know what MS said during the gamefest 2007.
Here a thread about it but slides are still unavailable.
http://forum.beyond3d.com/showthread.php?t=44450
here the sum up of the even
http://www.xnagamefest.com/talk_abstracts.htm#GRAPHICS

There seems to be a lot of interesting stuffs, but NDA should prevent our deer devs to inform us...

Here a quote relative to this topic
Advanced Xbox 360 Graphics Techniques Using Command Buffers and Predicated Tiling

Speaker: Matt Lee

Two holidays after the launch of Xbox 360, title developers are still squeezing graphics performance out of the Xbox 360 CPU and GPU using the powerful and often daunting predicated tiling and command buffer Direct3D APIs. In this talk, we examine the inner workings of commonly misunderstood APIs so you can make best use of their functionality. Come see tricks from real game titles, such as how advanced state inheritance can allow render target flexibility with command buffers.
 
a seperate die like Xbox 360's EDRAM chip, but with a large pool of RAM, could potentially be fantastic for next-gen.

mmmm 128 MB. higher precision, greater level of low-cost AA (notice I didn't say "free"), particle effects, 1080p, framebuffer tricks, etc would eat up that much and beg for more.
 
I think this thread could also serve as a reference :
http://forum.beyond3d.com/showthread.php?t=39096&page=6

If I remenber properly Fran explain that some exotic use of the xenos don't fare well with tiling.

Anyway it could be interesting to learn about the progress MS on the tools/methodology front.

(And slides from the gamefest would be welcome even if I feel that we will never see them...NDA).
 
The way I see it is that tiling on Xenos takes the best of TBDR without the memory or bandwidth cost of binning polygons. I think this is very important because RAM is a very precious resouce, and IMHO is the limiting factor for a lot of the stuff we find technically impressive in games. Basically, instead of figuring out which polygons will be rendered into each tile and storing the results, you figure out which objects could render into each tile, possibly without any storage at all because scene graph traversal is fairly cheap.

There are orders of magnitude fewer objects to bin than polygons, but because objects are bigger, you need a much bigger tile to keep most objects within one tile and avoid processing its geometry twice. Thus eDRAM is necessary.

For next gen, 64MB is enough, IMO, to avoid tiling altogether. 32bpp gives you enough range, and beyond 4xAA at 1080p gives you very diminishing returns at that resolution for 99% of TV setups, especially if you do something clever like CSAA which doesn't need more space.

However, if it turns out that tiling is pretty easy to work with by 2011, we may not need 64MB. 16MB may be enough, and at that point it should be a small enough fraction of total console cost to be a no-brainer.

I'm still interested in what devs have to say about my original comment quoted in Joshua's post.
 
First, great topic. :)

This earlier thread by Shifty might come in handy with respect to Xenos & tiling:
http://forum.beyond3d.com/showthread.php?t=40456

Bonus points for developers giving feedback on embedded memory on the next consoles. Should they go the tiling route? How much is necessary? 64MB? 128MB? How much percision will be needed for FP blending? Will the approach used by KZ2 become popular... and not be very compatible with embedded memory?


(Disclaimer: I’m not a developer ;))


By the time the next-gen comes, we should be at 32nm. With eDRAM memory cells, that *should* give close to the theoretical 8 times increase in transistor density (~70-80MB in the same space as Xenos' current 90nm configuration). Of course, that doesn’t include the logic stuff that makes the daughter what it really is (ROPs, Z, etc), and that doesn’t include any enhancements they’ll certainly implement should they keep the same fundamental architecture.

In determining how much eDRAM is needed, I’d like to first limit the scope of the framebuffer and the precision. As we are already seeing in the PC space, there are more efficient AA methods at the high end with CSAA or CFAA. From what I've seen they have quite good quality (CSAA at any rate, but I guess that’s a different argument entirely :p). Maybe it isn't perfect, but on the average consumer television? Hence, in the console space, I question if there is really a need to go beyond 4xMSAA in terms of storage.

Taking a few notes from the HDR discussion regarding Halo 3, FP16 is good enough that anything higher would be ridiculous! Yes yes… D3D10.1 specifies FP32, but… come on, there are better things to focus on. The bandwidth requirements would be 4 times that of FP10!

That doesn’t preclude the use of clever tricks that may be developed along the lifetime of Xenos or RSX, but for the sake of an upper limit for the back buffer… I think it’s safe to say that FP16 (with alpha blending) will be the highest in use. Why go further ?

With that in mind let’s consider that either 8:8:8:8 (RGBA) or 10:10:10:2 (FP10), and a 32-bit Z-buffer the backbuffer should take:

63.3 MB for 1080p and 4xMSAA
28.1 MB for 720p and 4xMSAA

and with 16:16:16:16 (FP16), the figures double.

So by next-gen we should have FP16 blending support and a much higher transistor density. If MS wanted to make things really easy for developers, they’d go with the 128MB of eDRAM. It shouldn’t take a leap of faith to think that won’t happen. They sure didn’t make it easy to even do 720p & 2xMSAA this gen. At this point, I would expect 60MB in order to fit 720p, 4xMSAA, FP16.

But with the whole idea of tiling and “forcing” developers to think about implementing it now, MS could easily be hardware-cost-conscious and go with only 30MB at the very least.

They’ll probably implement those variations on FP10 that was discussed briefly in the “HDR the Bungie Way” presentation (i.e. 6e4, 6e5 versus the current 7e3 for FP10), making FP16 seem moot. They may target those instead, and in that way, 30MB seems an “ok” compromise.

It’s a question of diminishing returns. Do we really need to support universal 1080p? Some developers are getting used to tiling now… So perhaps 1080p, 2xMSAA (including CSAA/CFAA, which do not take extra storage), FP10 would be a better storage target (~32MB). But that’s only considering a single RT.

There is no doubt D3D10.1+ will bring a whole new host of abilities to help out with multiple render targets, AA, deferred shading/lighting etc, and maybe we should look to those methods as becoming a standard *cough* UE3.0/4.0*cough*. 64MB will be good there. (Bungie can go and do their weird LDR/MDR RTs @ 1080p 2xMSAA FP10 ;)).

And 64MB brings me back to hardware manufacturing considerations and the numerous methods to fit the framebuffer in there.. :)


edit: and ah...looks like I agree fully with Mintmaster. :LOL:

Though maybe 16MB seems too little... 32MB seems quite reasonable.
 
So, devs, sound off. What is the state of tiling? More and more titles are using it, but some big games aren't. Why? What do you have to say about performance? Is it technically difficult beyond a surface examination? Are the APIs up to snuff?

Not sure why others aren't tiling. Calculating your tile effectively also tells you if the object in question is visible, so you can do visibility culling/tile calculations all on one shot. Write it in vmx and you can even calculate tile #'s 4 at a time. It's not that cpu costly, I'm doing it 30,000+ times every frame for crowd members a 60fps, there is plenty of cpu power to do it.

The only reasons I could think of for people not doing it are:

1) Pixel side costs. It's been billed as being 'free' on the pixel side, but that's not totally true. If you have alot of tiny triangles that take up little area, like if you have a poor lod system and hence say a 200 poly object that takes up 10 or so pixels on screen, then there is a pixel side cost. This one may have bit some people by surprise. Because the 360 is so good at dealing with verticies, that makes it easy to get sloppy with the excess geometry and llimited lod calculations. But, you then see a hit when you try to tile.

2) Most people don't notice or care. We love our AA here, but going by anecdotal evidence, most non tech people I talk with don't seem to care anywhere near as much as us about AA. So, maybe devs just skip AA and save themselves some time. I encountered this very recently when I mentioned to a group of people at a lunch that Halo 3 was rendering at 640p with no AA. My statement met with 5 seconds of silence, not only did they not know, they didn't even care, it still looked good to them. So, maybe tiling related tasks are getting relegated to "non important" or "if we have time" status when budgeting peoples tasks since theres basically other things that the common user responds more to than anything gained from tiling.

3) They just ran out of time.


Anyways, thats my theories. An alternative option is to not tile everything. Aliasing can be seen on some types of objects more than others. so you can always just first render those objects that are more susceptible with tiling/msaa, resolve that out and resubmit it as the new background, and continue rendering everything else with no msaa/tiling.


Sports games seem to have gotten it down.

One thing with sports games is that we can't hide behind explosions and other special effects. It's like at a magic show where there is always a hot girl on stage to distract people from whats really going on. Sports games don't get to have the hot girl on stage ;( So we're a bit more susceptible to scrutiny in some ways if you know what I mean.
 
Last edited by a moderator:
1) Pixel side costs. It's been billed as being 'free' on the pixel side, but that's not totally true. If you have alot of tiny triangles that take up little area, like if you have a poor lod system and hence say a 200 poly object that takes up 10 or so pixels on screen, then there is a pixel side cost. This one may have bit some people by surprise. Because the 360 is so good at dealing with verticies, that makes it easy to get sloppy with the excess geometry and llimited lod calculations. But, you then see a hit when you try to tile.

hm... I wonder if this was another problem in Halo 3 due to the longer view distances in some levels. I didn't particularly notice the same aggressive LOD that they've used in the first two games... hmm...

Crytek must have a crazy good LOD system in-place...

Anyways, thats my theories. An alternative option is to not tile everything. Aliasing can be seen on some types of objects more than others. so you can always just first render those objects that are more susceptible with tiling/msaa, resolve that out and resubmit it as the new background, and continue rendering everything else with no msaa/tiling.
Curious... could this be how Epic did their AA in Gears of War? The use of AA on static objects sounds remarkably similar to what you describe:!:
 
Can someone start a quick pros and cons list of tiling/edram? Doesn't it limit the use of procedurally generated data, forces z-prepass to be effective, have additional geometry overhead for objects crossing tile boundaries, and increase complexity when working with MRT. Seems to allow for more predictable bandwidth management, remove fillrate bottlenecks, and provide for inexpensive AA and blending sans tiling caveats.

I wonder if done again, if eDram would be chosen over a 256bit interface + ROP compression features for the Xbox 360.
 
I don't think that you can say for everyone that the impact of tiling is all that cheap.

The GPU-side of the equation is all Microsoft will ever talk about, and that can be fairly trivial much of the time. Going through the scene graph multiple times isn't something you can take that lightly. It also means throwing away a lot of things you might have done before because they might have helped without tiling, but they can potentially be harmful with it. Another factor is simply the fact that if you're working on multi-platform, it can be harder to hide tiling behind a veil since it's something that only appears on one platform (which wouldn't be a problem if the tiling were actually transparent, but that's so very much not the case). That said, though, using tiling on 360 and using SPU-culling on PS3 can at least mean you make room for a veil on both platforms.

As joker mentioned, though, there is the "lots of people don't care" part, and the fact that the "AA" requirement is vague by nature. If you can hide aliasing with all nature of bloom and blur and what not, then it further lessens the need for something as destructive as tiling. This, I would say, also links to the "not enough time" factor -- not so much because it takes an enormous expanse of time to do, but because too many things are far more important (and should be).

I can also think of at least 4 small independent studios who work on 360, but are largely pervaded by a "Microsoft-hate" culture, so I don't know what to tell you there. A minority, but they do exist, and lots of people move in and out of such studios and get infected.
 
Sports games seem to have gotten it down.

I wonder if this is easier for sports games because of "static" geometry. In games like Gears of War, Halo3, environment geometry is always changing as you pass and new data is streamed but in sports game, you are always in one same environment with same geometry. I dont know if this makes any difference but this is one idea.
 
2) Most people don't notice or care. We love our AA here, but going by anecdotal evidence, most non tech people I talk with don't seem to care anywhere near as much as us about AA. So, maybe devs just skip AA and save themselves some time. I encountered this very recently when I mentioned to a group of people at a lunch that Halo 3 was rendering at 640p with no AA. My statement met with 5 seconds of silence, not only did they not know, they didn't even care, it still looked good to them. So, maybe tiling related tasks are getting relegated to "non important" or "if we have time" status when budgeting peoples tasks since theres basically other things that the common user responds more to than anything gained from tiling.
Thanks for your insights, really interesting as usual.

I think your right about how average consumers feel in regard to AA.

But there something that bother me even more than the lake of AF.
From your experience, would you give the same in answer about AF?
(because there is a lot of people here who start to think that xenos has an issue in this regard).
 
So, devs, sound off. What is the state of tiling? More and more titles are using it, but some big games aren't. Why? What do you have to say about performance? Is it technically difficult beyond a surface examination? Are the APIs up to snuff?

Personally I consider the tiling problem "solved", in the sense that there are a couple of good implementations which are practically negligeable CPU-side and give good tiling rejection per-mesh, making 4X practical. I can't live without AA4X, it's not something I want to drop to get some performance back.

I haven't had any issue with tiling for the last couple of months, but before then it was hell.

Implementing, testing and profiling a solution for tiling takes (a lot of) development time, not every team can afford for the sake of MSAA.

Now that the API is stable, the problem is known and shipped solutions exist I think we'll see more games using tiling.

Bonus points for developers giving feedback on embedded memory on the next consoles. Should they go the tiling route? How much is necessary? 64MB? 128MB? How much percision will be needed for FP blending? Will the approach used by KZ2 become popular... and not be very compatible with embedded memory?

Mixed feelings here: one day I love EDRAM, the next day I would just get rid of it all together. As I wrote before, EDRAM makes some things extremely fast and some problems just disappear (bandwidth to the framebuffer can be considered for all practical purposes infinite). But EDRAM gives you less freedom and less architectural choices are available: some ideas just don't play well with it. You have some kind of barriers in the engine pipeline that you can't cross and you have to work around them.

A simple example: all offscreen rendering, shadow buffers for example, must be done before rendering the main scene (with tiling), you just don't have enough EDRAM to fit everything in there and reause, for example, a shadow buffer for multiple lights.

In pratice, when you an architecture that works, it's very fast, but you won't really be able to move away easily from that.

Maybe if I had lots of EDRAM in the next-gen I would love it again.
 
Great thread and great post by everyone. Sometimes it's good to bring a subject back every once in a while. There always seems to be info that most of us who aren't working on the machines weren't aware of before.
 
Not sure why others aren't tiling. Calculating your tile effectively also tells you if the object in question is visible, so you can do visibility culling/tile calculations all on one shot. Write it in vmx and you can even calculate tile #'s 4 at a time. It's not that cpu costly, I'm doing it 30,000+ times every frame for crowd members a 60fps, there is plenty of cpu power to do it.
That's what I figured. I mean, unless you're bypassing frustum culling entirely and just throwing tons of offscreen polys at the GPU, the code modification seems minimal. It's very straightforward math that's easy to parallelize, too, so I can't imagine it taking more than a few percent of additional CPU time in the worst case.

Is it hard making an art pipeline that chops large objects like environment/terrain or vehicles into pieces that minimize vertex load duplication?
The only reasons I could think of for people not doing it are:

1) Pixel side costs. It's been billed as being 'free' on the pixel side, but that's not totally true. If you have alot of tiny triangles that take up little area, like if you have a poor lod system and hence say a 200 poly object that takes up 10 or so pixels on screen, then there is a pixel side cost. This one may have bit some people by surprise. Because the 360 is so good at dealing with verticies, that makes it easy to get sloppy with the excess geometry and llimited lod calculations. But, you then see a hit when you try to tile.
That's surpising. I'd figure that if you had that many tiny polygons then you'd surely be setup limited. Thinking about it further, though, a polygon hitting just one sample will occupy an entire quad, and a shader with over 4 texture fetches is enough to put the bottleneck back on the pixel side. Still, it seems like you'd need a lot of tiny vertices to substantially impact framerate.

2) Most people don't notice or care. We love our AA here, but going by anecdotal evidence, most non tech people I talk with don't seem to care anywhere near as much as us about AA. So, maybe devs just skip AA and save themselves some time.
That's what I figured, and I guessed this was a big reason in several posts before. I assume the same applies for AF, right? I just don't see the point of HD rendering without AF. Even 480p with AF can have more detail in large areas than 1080p without AF.

Anyways, thats my theories. An alternative option is to not tile everything. Aliasing can be seen on some types of objects more than others. so you can always just first render those objects that are more susceptible with tiling/msaa, resolve that out and resubmit it as the new background, and continue rendering everything else with no msaa/tiling.
That seems like a lot more trouble than it's worth. You can't even use the same Z-buffer, can you? Or can you resolve it and load it back in somehow? If not, it would have to be simple compositing.
 
I don't think that you can say for everyone that the impact of tiling is all that cheap.

The GPU-side of the equation is all Microsoft will ever talk about, and that can be fairly trivial much of the time. Going through the scene graph multiple times isn't something you can take that lightly.
How so? What's so intensive about going through the scene graph multiple times? Even if it was, why can't you just keep a buffer of pointers to each object to be drawn with a flag for each tile, so that you only have to traverse it once? Even with hundreds of thousands of objects on screen, it'll be under 1MB.
It also means throwing away a lot of things you might have done before because they might have helped without tiling, but they can potentially be harmful with it.
Any examples?
Another factor is simply the fact that if you're working on multi-platform, it can be harder to hide tiling behind a veil since it's something that only appears on one platform (which wouldn't be a problem if the tiling were actually transparent, but that's so very much not the case).
The funny thing is that lack of tiling seems to be even more prevalent on XB360 exclusives.



Personally I consider the tiling problem "solved", in the sense that there are a couple of good implementations which are practically negligeable CPU-side and give good tiling rejection per-mesh, making 4X practical. I can't live without AA4X, it's not something I want to drop to get some performance back.

I haven't had any issue with tiling for the last couple of months, but before then it was hell.

Implementing, testing and profiling a solution for tiling takes (a lot of) development time, not every team can afford for the sake of MSAA.

Now that the API is stable, the problem is known and shipped solutions exist I think we'll see more games using tiling.
Thanks for the info, and I agree that 4xAA is totally worth it even if you have to reduce resolution to half the pixels (which would rarely be the case). What kind of things gave you hell, if you don't mind me asking?
Mixed feelings here: one day I love EDRAM, the next day I would just get rid of it all together. As I wrote before, EDRAM makes some things extremely fast and some problems just disappear (bandwidth to the framebuffer can be considered for all practical purposes infinite). But EDRAM gives you less freedom and less architectural choices are available: some ideas just don't play well with it. You have some kind of barriers in the engine pipeline that you can't cross and you have to work around them.

A simple example: all offscreen rendering, shadow buffers for example, must be done before rendering the main scene (with tiling), you just don't have enough EDRAM to fit everything in there and reause, for example, a shadow buffer for multiple lights.
Good example, especially pertinent for deferred rendering with many local lights (like KZ2). More EDRAM would partially solve this case, but a future algorithm may need even more, so it's hard to draw a line and determine the right amount.

I guess the best solution is to have a fallback of traditional rendering straight to RAM. Since BW of RAM seems to increase at a much slower pace than transistor density, there's a good chance that by next gen the ROP cost to make this possible would be almost negligible. I think this also means that EDRAM will become a necessity to get all you can from the billion-transistor chips we'll see next gen.
 
One thing with sports games is that we can't hide behind explosions and other special effects. It's like at a magic show where there is always a hot girl on stage to distract people from whats really going on. Sports games don't get to have the hot girl on stage ;( So we're a bit more susceptible to scrutiny in some ways if you know what I mean.
You do have the advantage of being able to confine the camera to the stadium, and running a fixed number of actors though. That must help a lot with budgeting memory, cpu cycles, and draw ops.
 
I don't know much about technology or, in this case, the eDRAM so I find threads like this a neverending source of fascination (too bad some brilliant posts with interesting opinions have vanished off the face of the earth) :( and also this is where I get a bit confused, to be honest.

I'd like to play my favourite games for years if the nostalgia thing comes out and, pretty much like the PlayStation backwards compatibility -a successful achievement by Sony generation after generation- I'd love a feature like that in order to play my AAA games on the next Xbox.

The thing is, I wouldn't say Fran is (sometimes) as happy as a lark with this kind of RAM, and ShootmyMonkey doesn't feel great joy also. The real question is; is there any way to emulate the eDRAM not having to depend on it at all? As I understand it it's impossible to emulate because of its "special" logic but like I said before, my knowledge is very limited... not limited --
probably nil. Just curious...
 
Last edited by a moderator:
Edram seems to have a lot pro and cons ;)

At the time MS launched the 360 I can't remember witch kind of GDDR was available.
MS (witch I think must have think about it quiet a lot) thank that in the long run it would be cheaper to use Edram + 128bits bus instead of a 256bits + faster memory (maybe the same memory could have been fine with some hardware tricks aka the quicunx from Nvidia).

We will see if they were right, the 360 revision witch supposely will include a 65nm gpu is rumored to be out around the middle of 2008 we will see at this time if engineers have manage to put together the xenos and the daughter chip (smart edram).
And RAM seems really cheap actually.

From my uneducated point of view I still think that edram has some advantage but the more ps3 games allowing for some AA I see the more I think MS has maybe make an error.
AA can be achieve with less bandwidth (compression, tricks, selective AA, the quicunx from Nvidia, etc.) worse the design seems to put quiet some hurdles too exotic kind of rendering and proper use of the xenos exclusive features (in regard to rsx or most DX3 compliant gpu).

MS could have offer a really flexible design to our deer devs!

I don't intend to start a flame war ;) 360 has been here for almost 2 years now, lots of games looks great so I hope nobody will come with some "trollish" comments...
And again I think MS gave the question quiet some time since they design the 360 with future cost reductions in mind.
 
Last edited by a moderator:
Back
Top