Why Deferred Rendering is so important?

AA was still a problem with deferred renderers on DX9, with DX10 we should be able to read back each subsample belonging to a pixel so that we can build an accurate stencil mask which marks edge pixels; early stencil rejection then should help us to shade only one subsample per pixel on the vast majority of the screen area, while we can directly supersample and resolve all the other pixels with a custom shader (but with the advantage of having a rotated or sparse sampling grid)
 
You assume deferred shading has not additional costs compared to non deferred shading, unfortunately this is not true as writing and reading back g buffer(s) is not free from a performance and memory stand point.

Does that mean that r600's memory interface can give it an advantage on games that heavily use deferred shading? And will deferred shading put less stress on shaders but more stress on memory bandwidth?

Or memory bw isnt an issue, it is mostly a memory size issue?
 
R600 is not out yet and I don't know its real specs.. anyway having a lot of bandwidth should certainly help if you're rendering to a few floating point render targets + AA at the same time :)
 
I'd bet $3.50 that Crackdown is also DR'in.

Advantages would be that an increased light count only really hits your ability to throw pixels onto the screen. The cpu work required is low, and there is basically no geometry work. Occlusion testing is fast and 'automatic' too.
Furthermore you can do some fancy things with processing your 'gbuffer'. So you can have geometry and normal warping effects. For example, a bullet hole can actually modify the stored geometry in the gbuffer, so it actually looks like a 3d hole - you could distort normals under water for fake caustic effects, etc. Things like occluded parallax mapping also become more viable from a performance standpoint as there isn't a per-light hit.

I was recently mucking about with some DR ideas in xna. Here is a pic.. Thats running on my x1800xl, 60fps, 111 lights, about 15-20 of which are fullscreen. So DR can be really very fast - but it's tricky. The big problem is on normal hardware you are so pixel/texture limited, you have little breathing room to be fancy.
Although I won't be making a game of this, I am going to use the stuff as a basis for a different project.
 
Last edited by a moderator:
Apparently the new beta nVidia G80 drivers have enabled AA in Rainbow Six : Vegas (Unreal Engine 3).

According to a friend who tested it last night, VRAM usage goes from ~500 to ~658 by applying 2xAA in max settings @ 1600x1200. He went on to say :

Without AA at 16x12 i get around 60fps, when i enable 2xAA it goes down to approx 40fps, with 4xaa it goes down to late teens low 20's

Is this the expected performance trade-off for using AA in DX9 games that use Deferred Rendering?

I used 8x as well and the hit was the same as 4xAA from what I could make out and it was definetley applying more AA

Why would this be?
 
Last edited by a moderator:
Is it so sure, that UE3 is a deferred renderer?
It's easier now than it was 6+months ago. The X360 APi's are still evolving to some extent and a lot of the tools to make a good engine that exploits tiling well are around now, although my understanding is that even the current XDK doesn't expose all of the functionality.

The TRC is extremly none specific, technically the minimum res is 720P and you have to provide something to address some of the aliasing. Like all TRC's they're negotiable to some extent and MS will give exemptions for the right titles, although I suspect they will be harder to get as excuses like "launch game" start to disapear. It'd be pretty hard for me to justify anything less than 720P with 2x hardware AA to anyone I work for.

The only reasonable excuses I've heard for lack of AA other than not enough time is the one Epic uses to justify it in the Unreal3 engine games. Basically their shadowing algorythm projects screen space pixels back into light space so they would have to do 2x or 4x the work for shadowing if AA was enabled, but that's true reguardless of the AA implementation.
 
In DX9 AA run in R6 Las Vegas , with visible or not visible problems in the shadows - many shadows disappear .Much people do not see this , think that she works correctly.
 
http://www.nvnews.net/vbulletin/showpost.php?p=1231918&postcount=604

AA and shadows work in R6

r6vegasgame200704200918wc5.png



PD / but in my system , I dont has shoft shadows with the 153.19 and AA in windows XP ....
 
Last edited by a moderator:
PowerVR Defered Shading + Defered Pixel Processing

Not really true at all. You need a lot of extra bandwidth to write the normals and position then read them back. Even using Z and calculating position is a notable cost. Then there's lack of AA as well.

In an interesting combination of the two unrelated concepts mixed into this thread, PowerVR SGX has on-chip MRTs, so it should be possible to avoid the bandwidth hit you mention with some care, since that traffic needn't go out to memory. Presumably they still have the 1990s vintage deferred pixel shading as well. Kudos to Imagination Tech.
 
In an interesting combination of the two unrelated concepts mixed into this thread, PowerVR SGX has on-chip MRTs, so it should be possible to avoid the bandwidth hit you mention with some care, since that traffic needn't go out to memory. Presumably they still have the 1990s vintage deferred pixel shading as well. Kudos to Imagination Tech.
on chip MRT can help you during the geometry pass, but it really can't do much in the lighting pass
 
on chip MRT can help you during the geometry pass, but it really can't do much in the lighting pass

It begs for a tile-aware extension.
Code:
Per tile:
   Render scene, culling to tile, instead of whole-screen
   Do lighting + misc image space ops on tiles directly in on chip mem.
   Dump tile to framebuffer

Needs new software & probably hardware support but might be interesting.
 
on chip MRT can help you during the geometry pass, but it really can't do much in the lighting pass

Not sure how you've reached this conclusion? With on-chip MRT the G-Buffer data that needs to be fetched during the lighting passes is immediately accessible (no trip to memory) thus it will help bandwidth as well. Depending on how many lighting passes are performed I would say it may help even more than the geometry pass.
 
Not sure how you've reached this conclusion? With on-chip MRT the G-Buffer data that needs to be fetched during the lighting passes is immediately accessible (no trip to memory) thus it will help bandwidth as well. Depending on how many lighting passes are performed I would say it may help even more than the geometry pass.
It's not so easy:
1) Do PVR architectures support to use the on chip memory as a texture? it's not trivial to do.
2) What if in the lighting pass I sample pixels around my pixels? How can the chip know in advance that I'm going to need pixels that have not been rendered yet?
 
1) Do PVR architectures support to use the on chip memory as a texture? it's not trivial to do.
Maybe not trivial; this would probably require some form of API extension where you tell the graphic API that you want to keep a texture render target on chip and that you'll be sampling from it at a 1:1 mapping ratio. On-chip MRT would be pretty useless if you couldn't specify where and when you want to use it though.

2) What if in the lighting pass I sample pixels around my pixels? How can the chip know in advance that I'm going to need pixels that have not been rendered yet?
In Deferred Shading the lighting passes are typically performed at a 1:1 mapping ratio, i.e. shade each pixel using the properties stored in the G-Buffer for this pixel. In the post-processing passes you may want to sample pixels around though, which indeed means the texture would need to be dumped to memory at this point.
 
In Deferred Shading the lighting passes are typically performed at a 1:1 mapping ratio, i.e. shade each pixel using the properties stored in the G-Buffer for this pixel. In the post-processing passes you may want to sample pixels around though, which indeed means the texture would need to be dumped to memory at this point.
What's typical doesn't stay that way foreve; for example if I wanna do some dynamic ambient occlusion computation in screen space I need to read, per pixel, z values of the neighbour pixels.
How am I going to do that if those z values have not been computed yet?
 
What's typical doesn't stay that way foreve; for example if I wanna do some dynamic ambient occlusion computation in screen space I need to read, per pixel, z values of the neighbour pixels.
How am I going to do that if those z values have not been computed yet?
Sure, the case you speak of couldn't benefit from on-chip MRTs since you'd need to sample adjacent pixels from the G-Buffer. However I still think this kind of operation is atypical of lighting passes. To take your example, calculating ambient occlusion in screen space will yield incorrect results when e.g. dynamic opaque objects are rendered in front of static geometry. All of a sudden pixels in the static background will sample adjacent pixels belonging to the object in front which will obviously change the ambient occlusion calculation (which shouldn't be the case, especially if the object happens to be far from this static background). You might be able to help by looking at objects ids and such, but in all cases you'll still lose the GBuffer data of the adjacent pixels in your static background (since covered by your object in front), which means your ambient occlusion becomes variable. This is likely to be observed with a "halo effect" on the silhouette of dynamic objects.
I think we agree on the principles; on-chip MRTs is only useful if you don't start sampling adjacent pixels (which you will need to do in certain cases anyway, like post-processing or some future screen-space lighting pass effect).
 
I was always under the impression it was only deferred shadow rendering, that it wasn't a deferred renderer.
That's a myth and not possible. You need the shadows when you do the lighting calculation. You can't add shadows in another pass.
 
Sure, the case you speak of couldn't benefit from on-chip MRTs since you'd need to sample adjacent pixels from the G-Buffer. However I still think this kind of operation is atypical of lighting passes. To take your example, calculating ambient occlusion in screen space will yield incorrect results when e.g. dynamic opaque objects are rendered in front of static geometry. All of a sudden pixels in the static background will sample adjacent pixels belonging to the object in front which will obviously change the ambient occlusion calculation (which shouldn't be the case, especially if the object happens to be far from this static background). You might be able to help by looking at objects ids and such, but in all cases you'll still lose the GBuffer data of the adjacent pixels in your static background (since covered by your object in front), which means your ambient occlusion becomes variable. This is likely to be observed with a "halo effect" on the silhouette of dynamic objects.
I think we agree on the principles; on-chip MRTs is only useful if you don't start sampling adjacent pixels (which you will need to do in certain cases anyway, like post-processing or some future screen-space lighting pass effect).
Computer graphics, especially realtime computer graphics, is the art of fakng things, and believe me if I say no one is going to notice an halo in the ambient occlusion component.
 
That's a myth and not possible. You need the shadows when you do the lighting calculation. You can't add shadows in another pass.
It's not a myth and it's completely possible as many games do it.
Shadows are computed before lighting, so that when you're lighting your scene you have already available a shadowing term (typically sampled from a texture)
 
Back
Top