Xenos/C1 and Deferred Rendering (G-Buffer)

Arnold Beckenbauer

Veteran
Supporter
Hi all!
There is a small question:
Is deferred rendering on Xbox360 possible? I mean the real deferred rendering like GRAW1&2 (PC) or STALKER.

One of the parts of the deferred rendering is the creation of G-Buffer. For example STALKER's G-buffer @ 1280x1024 is 30 MB big (3 A16B16G16R16 textures), KZ2's G-Buffer is 36 MB big (720p&2AA). Such really heavy G-buffers are too big for Xenos' eDRAM (10 MB).

The first idea: multi-pass. Xenos generates the G-Buffer by one texture after another, which are small enough for its eDRAM.

The second idea: MRT, which reduces the number of passes. Unfortunately Xenos' eDRAM is too small for this way. But is it possible to generate "MRT Tiles"? It looks so: http://www.beyond3d.com/content/articles/4/5

The third idea could be silly: Is it possible to generate a G-buffer directly in the main memory (without eDRAM)?

The next question is: Are there Xbox360 titles, that use Deferred Rendering? Doesn't GRAW's MP part use deferred rendering (it was developed by Red Storm Ent. and its own engine).
 
Last edited by a moderator:
I'm no expert, but I believe the eDRAM is just there an an optimisation. It's supposed to provide free AA on a render target that can fit on it. It doesn't mean that render targets can only reside in eDRAM. The problem with deferred rendering, I think, comes from the fact that it just doesn't make sense on the Xbox 360. It seems as if you can achieve better results with less work with just a regular, well-written forward renderer. And you don't sacrifice AA.
 
I'm no expert, but I believe the eDRAM is just there an an optimisation. It's supposed to provide free AA on a render target that can fit on it. It doesn't mean that render targets can only reside in eDRAM. The problem with deferred rendering, I think, comes from the fact that it just doesn't make sense on the Xbox 360. It seems as if you can achieve better results with less work with just a regular, well-written forward renderer. And you don't sacrifice AA.


Its not free AA, its not free at all.

And the ROPS are in the EDRAM die so the EDRAM cant be bypast, it has to use the EDRAM
 
Last edited by a moderator:
The only way to render is via ROPs to EDRam, if there would be another way, they'd have done this instead of emulating higher resolution via several passes. It's similar to the driver-tricks to have stereo rendering.

Another problem is, even if you could fit all the G-Buffer to EDRam, you could not use it in the deferred shading pass, you need to move it to main memory first.

The best way would be to use MRT, then you could dump the tiles, immediately shade them and reject the G-Buffer tile, but i'm not sure if there is any support on the API side as this emulation is transparent to the engine side (beside of some resource restrictions for sure)



Btw. KZ2's actual G-Buffer is 31.6MB, grows to 36MB due to alignment etc. I think on X360 you could avoid that overhead 'cause u copy the render targets into a tightly packed area.
 
It seems as if you can achieve better results with less work with just a regular, well-written forward renderer. And you don't sacrifice AA.

godd luck to handle that many lights with forward rendering....
 
rapso said:
Another problem is, even if you could fit all the G-Buffer to EDRam, you could not use it in the deferred shading pass, you need to move it to main memory first.
Why would you call that a problem:?: - it's Standard procedure for any sort of offscreen/render->texture rendering. Every single game on 360 has to do that multiple times per frame for stuff like shadowmaps, reflectionmaps, HDR resolves, postprocess resolves etc.

If you want to work without dumping eDram contents to main-memory more then once per frame, you shouldn't work on XBox360 (or Wii for that matter :p).
 
Why would you call that a problem:?: - it's Standard procedure for any sort of offscreen/render->texture rendering. Every single game on 360 has to do that multiple times per frame for stuff like shadowmaps, reflectionmaps, HDR resolves, postprocess resolves etc.

If you want to work without dumping eDram contents to main-memory more then once per frame, you shouldn't work on XBox360 (or Wii for that matter :p).

I was just replying to
The second idea: MRT, which reduces the number of passes. Unfortunately Xenos' eDRAM is too small for this way.
that there is no use in fitting the MRT to the EDRam(except of reducing passes), 'cause, like you said, it needs to be dumped anyway to main-memory .
 
Deferred rendering simply uses a larger backbuffer, like 4xAA with forward rendering at 1080p which certainly is possible on Xenos. You just have to tile it.

Tiling isn't really the same thing as multipassing. You can clip out a lot of the geometry when tiling if you have decent space partitioning (which helps reduce the geometry load for any console). Scene graph traversal could be a bit of a burdern on the CPU if your data is disorganized, though.

There's no doubt that Xenos' design speeds up forward rendering more than deferred rending, though, when compared to older GPU designs. You can't texture from the EDRAM, so the additional BW required for DR isn't alleviated much. Xenos can hide AF cost with math ops (like other current GPUs), but DR has little to no math going on at that time.
 
godd luck to handle that many lights with forward rendering....
I think the benefits are greater than just more lights - the compositing flexibility allows all sorts of potential tricks. The persistent smoke if KZ is something far more readily accomplished with a particles layer than rendering zillions of alpha particles. Though we shouldn't be side-tracking the topic of 'can you...' with 'would you want to...'
 
The third idea could be silly: Is it possible to generate a G-buffer directly in the main memory (without eDRAM)?

Hmm, I haven't heard if much research has gone on to attempt this, but depending upon the constructs available, it may be possible to manually create a psuedo g-buffer during an attribute pass using mem-export. It would be cool if the ability existed for a shader to direct "backend central" to export certain components to memory, based on the results of a z-test to eDram for a primary render target. Seems doubtful.
 
The persistent smoke if KZ is something far more readily accomplished with a particles layer than rendering zillions of alpha particles.

On the other hand, some GPUs don't mind rendering zillions of alpha particles, unlike others :p

Hmm, I haven't heard if much research has gone on to attempt this, but depending upon the constructs available, it may be possible to manually create a psuedo g-buffer during an attribute pass using mem-export.

Theoretically yes, but you'd lose the ability to rasterize, essentially using the GPU as a stream processor. I don't think it would be fast enough for anything useful.
 
Huh? First, what I'm suggesting I don't believe is actually possible. But consider a system in which you don't use MRT's to construct a G-buffer (since they can only exist in eDram), but instead, still use a typical color and depth buffer which leverages eDram. When performing the attribute pass, your shader calculates attributes you want in your g-buffer (like position, intensity, etc.) and stores them in output registers for memory export. Once color and depth are written to eDram, if the z-test for that particular pixel is passed, you then write the stored attribute values directly to main memory using mem-export to positions which correspond to that pixels location.
 
Thanks to all!
Deferred rendering simply uses a larger backbuffer, like 4xAA with forward rendering at 1080p which certainly is possible on Xenos. You just have to tile it.

Tiling isn't really the same thing as multipassing. You can clip out a lot of the geometry when tiling if you have decent space partitioning (which helps reduce the geometry load for any console). Scene graph traversal could be a bit of a burdern on the CPU if your data is disorganized, though.

There's no doubt that Xenos' design speeds up forward rendering more than deferred rending, though, when compared to older GPU designs. You can't texture from the EDRAM, so the additional BW required for DR isn't alleviated much. Xenos can hide AF cost with math ops (like other current GPUs), but DR has little to no math going on at that time.

Deferred rendering is possible, but there are no advantages to use it?
 
The point here is
Can you do deferred rendering on 360? Yes
Is it worth while? is more complicated and it likely depends heavilly on what you're trying to do and what your bottlenecks are/ can you live with the costs.

On a side nore, I think that a lot of the titles that have shipped with deffered renderers did because the engineers though it was a cool idea, rather than they wanted to solve a particular problem.
 
*cough* There is a world inbetween pure forward and pure deferred rendering. :)

I want to ask this what's the difference between deferred rendering say in Killzone 2 and GTA4?

Is deferred shading and deferred rendering means the samething?

Also what's the different between how PowerVR implement their deferred rendering to say something like Killzone2. What did PowerVR do differently in their implementation to get away with lower memory footprint and bandwidth requirement (Dreamcast argument) compare to minuses of a deferred renderer like Kilzone 2?

Are the PowerVR implementation advantages remained the same in this shader era?
 
Its not free AA, its not free at all.

And the ROPS are in the EDRAM die so the EDRAM cant be bypast, it has? to use the EDRAM?
Sc4freak is correct. It's should be nearly free AA if the render target fits in edram. I.e. 640x480x4.

And yes, you have to render to edram if you need the rops. Otherwise memexport will work.
 
Deferred rendering is possible, but there are no advantages to use it?
There definately are advantages. I'm just saying that the bandwidth saving attributes of Xenos' EDRAM have a greater effect on forward rendering than deferred rendering.

EDRAM can be fantastic for deferred rendering, but you need to be able to texture from it like PS2 does. MS/ATI didn't have access to a fab which could economically integrate EDRAM alongside a few hundred million transistors, so they moved it to a daughter die with relatively low complexity. Now that TSMC has EDRAM ability, we may see a more complete solution next time around.

On a side nore, I think that a lot of the titles that have shipped with deffered renderers did because the engineers though it was a cool idea, rather than they wanted to solve a particular problem.
I agree with you, and it's one of the reason's I haven't been so bullish on DR. There are some future-looking techniques that could make it more of a necessity, though.
 
Back
Top