Xenos/C1 and Deferred Rendering (G-Buffer)

I think there is a little problem that everything beyond forward rendering is called deferred rendering (I do it as well), but there are a lot of differences.

I want to ask this what's the difference between deferred rendering say in Killzone 2 and GTA4?
KZ2 does all the shading etc. from the G-Buffers. On 360 there is no real advantage of pure deferred stuff (even disadvantage i'd say), so I'd guess GTA4 is just doing part of their rendering deferred. like calculating that dithered shadowmask (like crysis does). Rendering the shadowcasting lightsources forward. then additionally rendering all other lightsources deferred, either saving g-buffer normals or (faster but inaccurate) generating normals from the depthbuffer.

Is deferred shading and deferred rendering means the samething?
deferred shading is a tech out of the deferred rendering 'magic box'

Also what's the different between how PowerVR implement their deferred rendering to say something like Killzone2. What did PowerVR do differently in their implementation to get away with lower memory footprint and bandwidth requirement (Dreamcast argument) compare to minuses of a deferred renderer like Kilzone 2?
the memory footprint was not really lower, but the bandwidth cost was.
PowerVR is doing deferred rasterization. they transforms and roughly sort triangles into tiles (like 32x32 pixel). on 'flip' they then go through the tiles and rasterize the triangles. that way they can save for each pixel what tri was the nearest and in the end just shade this one fragment. even sorting individual fragments is possible for proper alphablending.

Are the PowerVR implementation advantages remained the same in this shader era?
some are, others not really. saving all the tris is expensive, saving pointers to the tris in the tiles might nowadays exceed the amount of pixels, which might result in bigger buffers than the actual framebuffers are. adding a z prepass might cull as effective as deferred rasterization would for the final fragment shading work.
one big advantage was proper alpha sorting and that's really expensive to do it right in forward rasterization.
 
Sc4freak is correct. It's should be nearly free AA if the render target fits in edram. I.e. 640x480x4.

And yes, you have to render to edram if you need the rops. Otherwise memexport will work.

What about the extra rendering from overlapping tiles?
 
Last edited by a moderator:
What about the extra rendering from overlapping tiles?
Tiles don't overlap. Geometry can lie in multiple tiles, but 360 is very fast at chewing though it. Moreover, some of the per-polygon load in 3D rendering, like wasted pixel pipes along polygon edges, isn't duplicated.
 
Developer developer developer :) and multi-platform. (i'd sent it via pm, but i can't see any contact information in ur profile).

Thanks for the reply Rapso.:smile:

From reading your comments, it would seem the PS3 benefits more from a deferred rendering engine setup, and Xbox 360 more along the lines of forward rendering. Both rendering styles/engines having benefits and draw backs.

So my question would be…

What in particular (hardware, API, ECT…) makes PS3 more inevitable with deferred rendering, as Xbox 360 with forward rendering?

And if you had a choice between the two (deferred / forward) which one would you select, and why?
 
From reading your comments, it would seem the PS3 benefits more from a deferred rendering engine setup, and Xbox 360 more along the lines of forward rendering. Both rendering styles/engines having benefits and draw backs.
Not quite. You can implement a deferred renderer on any platform, but XB360 is set up for a traditional renderer with the eDRAM choice. A deferred renderer would be a huge gobbler of RAM BW of which XB360 has about half of PS3's, something that isn't a problem when much of the BW requirements is moved onto the eDRAM and logic.

And if you had a choice between the two (deferred / forward) which one would you select, and why?
It depends entirely on the game and what you want to create! You may as well 'if you had the choice between using a seven-seater family wagon and a sports-coupe, which would you select and why?' The choice of vehicle, or graphics engine, is determined by the work you need it to do.
 
It depends entirely on the game and what you want to create! You may as well 'if you had the choice between using a seven-seater family wagon and a sports-coupe, which would you select and why?' The choice of vehicle, or graphics engine, is determined by the work you need it to do.

What about using both? :) I read an interview with someone from GG and he said they were using a forward and deferred rendering engine in KZ2.
 
Last edited by a moderator:
Not quite. You can implement a deferred renderer on any platform, but XB360 is set up for a traditional renderer with the eDRAM choice. A deferred renderer would be a huge gobbler of RAM BW of which XB360 has about half of PS3's, something that isn't a problem when much of the BW requirements is moved onto the eDRAM and logic.
I think you're making the BW issue bigger than it really is. Remember that EDRAM makes all those random screenwrites become one efficient block copy per tile. A 30MB backbuffer will only take 4% of your frame time to put into RAM, assuming 30 fps.

IMO, the real reason forward rendering looks more appealing on 360 is AA and tiling. If your geometry and visibility system characteristics make you decide 3 tiles is your limit, you can do 720p 4xAA on 360 using a forward renderer, but only 650p 2xAA when deferred (assuming Z + 4x32-bit per sample).
 
I think you're making the BW issue bigger than it really is. Remember that EDRAM makes all those random screenwrites become one efficient block copy per tile. A 30MB backbuffer will only take 4% of your frame time to put into RAM, assuming 30 fps.

And how about all the memory reads while you're painting the lights in? That's where the memory bandwith shouldn't be enough...
 
And how about all the memory reads while you're painting the lights in? That's where the memory bandwith shouldn't be enough...
Sure, but PS3 doesn't have much of an advantage here aside from having another bus for the CPU. It also needs to read and write the framebuffer to add the light's contribution, so that's another 2x32b on top of the 5x32b being read in.

Both platforms will have a pros and cons of going deferred, and both have the con of more BW needed per pixel. Basically what I'm saying is that IMO the main difference in making this decision (i.e. choosing forward or deferred) is that on 360 you have the additional con of needing more tiles.
 
The following is from the link provided in my earlier post:

Edit: I guess I read your post a bit too fast for my own good :D, sorry.

http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf

pag. 7

pag. 40

Forward Rendering Pass
‣ Used for transparent geometry
‣ Single pass solution
‣ Shader has four uberlights
‣ No shadows
‣ Per-vertex lighting version for particles
‣ Lower resolution rendering available
‣ Fill-rate intensive effects
‣ Half and quarter screen size rendering
‣ Half resolution rendering using MSAA HW
 
Last edited by a moderator:
Sure, but PS3 doesn't have much of an advantage here aside from having another bus for the CPU. It also needs to read and write the framebuffer to add the light's contribution, so that's another 2x32b on top of the 5x32b being read in.
having twice the bandwidth is usually a big win, most of deferred rendering should be limited by this. also some stencil-/z-culling might be beneficial for deferred rendering. so you need to save the zbuffer to have an advantage. using tiled rendering doesn't make that simple. preserving the zbuffer on the edram might give ya good benefits in the deferred pass, but can also rise the pressure on memory to manage the passes.

it also depends on where you're limited, if you're texture-sampling limited, the unified shaders of x360 might be a big win, if you're memory bandwidth limited, the memory bandwidth of the ps3 might be a win.
 
having twice the bandwidth is usually a big win, most of deferred rendering should be limited by this.
During G-buffer creation, even though you write many bytes per pixel, you aren't BW limited. The reason is that they're opaque pixels and the ROP speed is set to match the BW in that situation. The more render targets you have, the fewer pixels per clock can be output.

In fact, G-buffer creation uses less BW per clock than a simple single or dual textured forward rendering pass, because Z-buffer and texturing BW are amortized over multiple ROP cycles.

Also, remember my comment of PS3 needing more BW in the lighting passes to accumulate the lights with alpha blending.

also some stencil-/z-culling might be beneficial for deferred rendering. so you need to save the zbuffer to have an advantage. using tiled rendering doesn't make that simple. preserving the zbuffer on the edram might give ya good benefits in the deferred pass, but can also rise the pressure on memory to manage the passes.
Stencil and Z culling are just as good on 360, if not faster. Also, you don't need to keep the Z-buffer in the EDRAM, so your premise is flawed. If you're doing a Z-only prepass (object level sorting should be faster overall, IMO, for DR), then you can copy the whole Z-buffer out and transfer in parts as needed later very quickly. If not doing a prepass, it's just like regular rendering on a smaller target for each tile.

it also depends on where you're limited, if you're texture-sampling limited, the unified shaders of x360 might be a big win, if you're memory bandwidth limited, the memory bandwidth of the ps3 might be a win.
It depends on how fast RSX is at rendering some parts of the G-buffer directly to XDR.
 
During G-buffer creation, even though you write many bytes per pixel, you aren't BW limited. The reason is that they're opaque pixels and the ROP speed is set to match the BW in that situation. The more render targets you have, the fewer pixels per clock can be output.

In fact, G-buffer creation uses less BW per clock than a simple single or dual textured forward rendering pass, because Z-buffer and texturing BW are amortized over multiple ROP cycles.
G-Buffer is just one part of the deferred lighting. writing is not a limit, that's why you have far less ROPs than TUs, usually you read more data than you write, and reading the G-Buffer ist the expensive part.
Assuming you have 4 64bit buffer and the shader are not that complex in the deferred pass shifts the load towards the reading bandwidth. compared to normal forward rendering with mostly compressed textures.

Stencil and Z culling are just as good on 360, if not faster. Also, you don't need to keep the Z-buffer in the EDRAM, so your premise is flawed. If you're doing a Z-only prepass (object level sorting should be faster overall, IMO, for DR), then you can copy the whole Z-buffer out and transfer in parts as needed later very quickly. If not doing a prepass, it's just like regular rendering on a smaller target for each tile.
the flaw in your idea is that you cannot transfer out any zbuffer optimization the hardware has build in. the stencil-/zbuffer without those is not a performance win as you probably know.


It depends on how fast RSX is at rendering some parts of the G-buffer directly to XDR.
...and how well you can split all your data needed for a frame, that's also true for forward rendering.
...and dont forget the RSX is working like a gfx with dedicated VRam, while the X360 is workin more like an system with onboard gfx (i'm not saying it's 100% the same!)
 
Back
Top