PDA

View Full Version : Why Deferred Rendering is so important?


frameavenger
18-Dec-2006, 17:20
Hi Masters of 3D!! :grin:

Why Deferred Rendering is so important for the games?

Examples: GRAW and UE3.0?


Thanks a lot!!
:smile:

ShootMyMonkey
18-Dec-2006, 18:50
It's not really "important" per se. It's just another method that proves useful in various situations. The main thing with deferred rendering is that you fill in info in a giant buffer about every point until you've gotten through all the geometry. Then you just render a full-screen flat polygon using that buffer in the pixel shader so it can get info about every point.

The biggest advantage for a lot of games is how well it scales with lighting complexity, especially since you never light a pixel that isn't visible.

Biggest disadvantages include transparency rendering (which is really a pain, but not impossible), and inability to use hardware AA (and even software AA cheats don't work out so smoothly).

SuperCow
18-Dec-2006, 18:55
Biggest disadvantages include transparency rendering (which is really a pain, but not impossible), and inability to use hardware AA (and even software AA cheats don't work out so smoothly).
Inability to use hardware AA on the DirectX9/OpenGL API, yes. In DX10 nothing is preventing you to render your G-Buffer in MSAA mode and using custom resolves to process each individual samples prior to the resolve operation.

nAo
19-Dec-2006, 09:39
Examples: GRAW and UE3.0?

Dunno about GRAW, but UE3.0 graphics engine is not a deferred renderer, only shadows (maps) computations are deferred (or better decoupled) from the renderer.

Marco

ShootMyMonkey
19-Dec-2006, 18:22
I don't think either of them is fully or even *majority* deferred rendering. It's more or less reserved for specific functions. I don't know of any commercial game off the top of my head that uses it, for instance, for all lighting. Deferred shaderized AA cheats, I've seen dozens of places, but I'd call that post-processing more so than "deferred rendering".

frameavenger
19-Dec-2006, 19:25
Dunno about GRAW, but UE3.0 graphics engine is not a deferred renderer, only shadows (maps) computations are deferred (or better decoupled) from the renderer.

Marco

RoboBlitz uses UE3.0 and lacks in the Anti-Aliasing option!!! :(

Why isn't there a menu option to turn on/off Anti-Aliasing (AA) in RoboBlitz?

http://www.roboblitz.com/HTML_SITE/support/support.shtml#system_requirements

* RoboBlitz uses delayed rendering that renders the scene once per visible light, which doesn't allow for realtime AA on today's graphics cards. There might be optimizations made in the future allowing for AA on certain lighting passes, but they're not in place yet, so that means no AA for now (even if you turn it on in your graphics card control panel). You can still turn on Anisotropic Filtering (AF) in your control panel and have that improve the texture filtering.

NeARAZ
21-Dec-2006, 14:48
That does not mean deferred rendering. Most likely they just want to say "we render into a floating point buffer, one pass per light to get HDR". AA and floating point buffers are not supported on the majority of hardware, but some of the later HW can do it.

nobond
22-Dec-2006, 00:03
Hi,
can you explain
"physicall" where is this giant buffer located in ?

It is in the CPU memory space or the GPU memory space ?
It's not really "important" per se. It's just another method that proves useful in various situations. The main thing with deferred rendering is that you fill in info in a giant buffer about every point until you've gotten through all the geometry. Then you just render a full-screen flat polygon using that buffer in the pixel shader so it can get info about every point.

The biggest advantage for a lot of games is how well it scales with lighting complexity, especially since you never light a pixel that isn't visible.

Biggest disadvantages include transparency rendering (which is really a pain, but not impossible), and inability to use hardware AA (and even software AA cheats don't work out so smoothly).

ShootMyMonkey
22-Dec-2006, 04:58
It's in GPU memory space in any major case I could possibly bring up. The "giant" buffer is a render-to-texture field, so it gets used in a later pass as a texture from which to access data and actually draw stuff based on what you find in the buffer. On the PC, it will be at least another 2 decades or longer (I actually believe it will be never) before bandwidth to/from main memory and the bandwidth across the video card's buslink is capable of keeping up with the data throughput rate necessary to move that much data as fast as the GPU would like.

If some sort of CPU-side processing is necessary for whatever reason, it'll usually be copied after the fact, but the preference is to do all that work on the GPU if you can.

Consoles are another ball of wax -- e.g. Xbox360 having one memory pool, or PS3 having not too bad observed bandwidth from the GPU to the main memory pool compared to its observed bandwidth to its own local pool. That said, I can't say I've ever heard of any specific counterexample.

nobond
18-Feb-2007, 13:46
Still something a little bit confused.

To my knowledge, even the non-deferred mode (immediate mode) you still need to buffer something
then kick off a render. What is the point to make a distinctin between the immediate mode and deferred mode ( maybe just the difference of the buffer ...) :roll:

Simon F
19-Feb-2007, 07:07
If you are referring to PowerVR's deferred rendering, then it also does so re-ordering of the polygons as it puts data into the pre-rasterisation buffer. In fact, that buffer is not just a single big FIFO but consists of multiple "bins".

Cal
19-Feb-2007, 10:20
GRAW X360 did not use deferred shading, but the PC version did. The online part of SCDA used deferred shading.

As for the advantages of DS: in current graphics pipeline, pixle shading is always done before ROP (including z-test), that means every pixel on a geometry will be shaded in despite of its visibility. So if DS is applied, by generating a g-buffer we can manage to do the z-test before actual shading. Even the latest hardwares featuring early z-test via hierarchical-z are not as efficient as DS. Another good aspect of DS is you can add a lot of light sources. Just project the light volume into screen space and rendering it with g-buffer, no pixel will be wasted.

nAo
19-Feb-2007, 10:25
Even the latest hardwares featuring early z-test via hierarchical-z are not as efficient as DS.
Umh..this is not really true, in the general case.

Rodéric
19-Feb-2007, 11:47
As for the advantages of DS: in current graphics pipeline, pixle shading is always done before ROP (including z-test), that means every pixel on a geometry will be shaded in despite of its visibility.

AFAIR, it's not true, as long as the fragment shader doesn't change the depth or texkill, the depth test is performed before shading, that's the early-z reject test.

ShootMyMonkey
19-Feb-2007, 18:40
To my knowledge, even the non-deferred mode (immediate mode) you still need to buffer something then kick off a render. What is the point to make a distinctin between the immediate mode and deferred mode ( maybe just the difference of the buffer ...)
I think you're confusing things like the vertex buffer and index buffer with the g-buffer in a deferred shader. The buffering of geometry before sending it off is grouping some sections or groups of geometry into a single batch sent off at once. This you typically group by renderstates (including things like transformations, which shaders, and which textures need be applied). That's something you do anyway. That's buffering the *input* to the GPU. In a deferred shader, on the "main" pass, you have a giant buffer that is the *output* of that pass.

Deferred shading means the actual "shading" part is not done immediately on that main pass. Rather, there is a pass that renders information about how to shade the pixels. Information on a per-pixel level is buffered off and then there's a "deferred" pass that actually does the shading.

Comparatively, with an immediate mode renderer, the output you get in the primary pass is the actual shaded pixels (i.e., they're shaded "immediately").

I'm kind of ignoring other passes you make like shadow map passes and what not, because those are really independent of how you actually perform your shading. Although since the word "deferred" has the meaning of postponing something, there are certain other cases where it's used in a different context. For instance, PSP has both immediate mode and deferred mode rendering, but what's deferred is actually the issuing of geometry.

Cal
19-Feb-2007, 19:25
AFAIR, it's not true, as long as the fragment shader doesn't change the depth or texkill, the depth test is performed before shading, that's the early-z reject test.

Then you need a pre-zpass to use hier-z. Alpha masked object like foliage will cause trouble there when you try to exploit the z-only double speed output.

Umh..this is not really true, in the general case.
Well, I haven't tested deferred shading thoroughly. But according to my experience, the vertex cost is still there when the early z-reject is enabled in the regular rendering mode.

ERP
19-Feb-2007, 19:33
The PowerVR version of deffered shading needed to do the vertex work aswell. As would any solution that allowed vertex shader to do none standard transforms.

You could strip all the unneeded crap out, and with stream out, you could even do the transform work only once.

Mintmaster
19-Feb-2007, 21:27
Even the latest hardwares featuring early z-test via hierarchical-z are not as efficient as DS. Another good aspect of DS is you can add a lot of light sources. Just project the light volume into screen space and rendering it with g-buffer, no pixel will be wasted.
Not really true at all. You need a lot of extra bandwidth to write the normals and position then read them back. Even using Z and calculating position is a notable cost. Then there's lack of AA as well.
Then you need a pre-zpass to use hier-z. Alpha masked object like foliage will cause trouble there when you try to exploit the z-only double speed output.
Early z-test doesn't have to use a Z prepass. Rough front to back sorting will take care of most of the overdraw. Even if you did a prepass, lack of double Z under certain circumstances is almost irrelevent nowadays because the base pixel fill rate is so high. 10 GPix/s will fill a 1080p screen with 5x overdraw in a millisecond. The setup cost of sending twice the geometry is a bigger issue.

nAo
19-Feb-2007, 21:27
Well, I haven't tested deferred shading thoroughly. But according to my experience, the vertex cost is still there when the early z-reject is enabled in the regular rendering mode.
You assume deferred shading has not additional costs compared to non deferred shading, unfortunately this is not true as writing and reading back g buffer(s) is not free from a performance and memory stand point.
Can be a win in the end, but as I said, not in the general case.

Andrew Lauritzen
20-Feb-2007, 00:15
The largest advantage of deferred shading is the decoupling of lighting and surface computations, which simplifies engine management, improves batching, improves asymptotic complexity and solves the light contribution problem cheaply per-pixel.

It's also often useful to have various attributes (such as depth) available in the shader. Still this is a secondary benefit IMHO.

I wouldn't be surprised if more games star using deferred shading in the future. Sure it has it's share of overhead, but much of that is becoming less important as technology progresses. In particular, coherent and predictable memory access like that of deferred shading can be totally hidden (in the long run, if not already) - it will be random access that poses a significant problem for all architectures (and for algorithms in general) in the long run.

Having written both a forward and deferred renderer, I personally like the latter much more :) It's simple, elegant and efficient. It also runs plenty fast on modern hardware.

nAo
20-Feb-2007, 00:41
AA was still a problem with deferred renderers on DX9, with DX10 we should be able to read back each subsample belonging to a pixel so that we can build an accurate stencil mask which marks edge pixels; early stencil rejection then should help us to shade only one subsample per pixel on the vast majority of the screen area, while we can directly supersample and resolve all the other pixels with a custom shader (but with the advantage of having a rotated or sparse sampling grid)

NIB
20-Feb-2007, 01:19
You assume deferred shading has not additional costs compared to non deferred shading, unfortunately this is not true as writing and reading back g buffer(s) is not free from a performance and memory stand point.

Does that mean that r600's memory interface can give it an advantage on games that heavily use deferred shading? And will deferred shading put less stress on shaders but more stress on memory bandwidth?

Or memory bw isnt an issue, it is mostly a memory size issue?

nAo
20-Feb-2007, 01:28
R600 is not out yet and I don't know its real specs.. anyway having a lot of bandwidth should certainly help if you're rendering to a few floating point render targets + AA at the same time :)

Graham
20-Feb-2007, 02:22
I'd bet $3.50 that Crackdown is also DR'in.

Advantages would be that an increased light count only really hits your ability to throw pixels onto the screen. The cpu work required is low, and there is basically no geometry work. Occlusion testing is fast and 'automatic' too.
Furthermore you can do some fancy things with processing your 'gbuffer'. So you can have geometry and normal warping effects. For example, a bullet hole can actually modify the stored geometry in the gbuffer, so it actually looks like a 3d hole - you could distort normals under water for fake caustic effects, etc. Things like occluded parallax mapping also become more viable from a performance standpoint as there isn't a per-light hit.

I was recently mucking about with some DR ideas in xna. Here is a pic (http://www.hungryspoon.com/random/cr1.jpg).. Thats running on my x1800xl, 60fps, 111 lights, about 15-20 of which are fullscreen. So DR can be really very fast - but it's tricky. The big problem is on normal hardware you are so pixel/texture limited, you have little breathing room to be fancy.
Although I won't be making a game of this, I am going to use the stuff as a basis for a different project.

bigtabs
17-Apr-2007, 16:21
Apparently the new beta nVidia G80 drivers have enabled AA in Rainbow Six : Vegas (Unreal Engine 3).

According to a friend who tested it last night, VRAM usage goes from ~500 to ~658 by applying 2xAA in max settings @ 1600x1200. He went on to say :

Without AA at 16x12 i get around 60fps, when i enable 2xAA it goes down to approx 40fps, with 4xaa it goes down to late teens low 20's

Is this the expected performance trade-off for using AA in DX9 games that use Deferred Rendering?

I used 8x as well and the hit was the same as 4xAA from what I could make out and it was definetley applying more AA


Why would this be?

Arnold Beckenbauer
17-Apr-2007, 19:49
Is it so sure, that UE3 is a deferred renderer?
It's easier now than it was 6+months ago. The X360 APi's are still evolving to some extent and a lot of the tools to make a good engine that exploits tiling well are around now, although my understanding is that even the current XDK doesn't expose all of the functionality.

The TRC is extremly none specific, technically the minimum res is 720P and you have to provide something to address some of the aliasing. Like all TRC's they're negotiable to some extent and MS will give exemptions for the right titles, although I suspect they will be harder to get as excuses like "launch game" start to disapear. It'd be pretty hard for me to justify anything less than 720P with 2x hardware AA to anyone I work for.

The only reasonable excuses I've heard for lack of AA other than not enough time is the one Epic uses to justify it in the Unreal3 engine games. Basically their shadowing algorythm projects screen space pixels back into light space so they would have to do 2x or 4x the work for shadowing if AA was enabled, but that's true reguardless of the AA implementation.

Galduta
19-Apr-2007, 18:59
In DX9 AA run in R6 Las Vegas , with visible or not visible problems in the shadows - many shadows disappear .Much people do not see this , think that she works correctly.

AlStrong
20-Apr-2007, 04:28
Is it so sure, that UE3 is a deferred renderer?

I was always under the impression it was only deferred shadow rendering, that it wasn't a deferred renderer.

Galduta
20-Apr-2007, 14:15
http://www.nvnews.net/vbulletin/showpost.php?p=1231918&postcount=604

AA and shadows work in R6

http://img177.imageshack.us/img177/2843/r6vegasgame200704200918wc5.png


PD / but in my system , I dont has shoft shadows with the 153.19 and AA in windows XP ....

acox
18-Aug-2007, 15:41
Not really true at all. You need a lot of extra bandwidth to write the normals and position then read them back. Even using Z and calculating position is a notable cost. Then there's lack of AA as well.

In an interesting combination of the two unrelated concepts mixed into this thread, PowerVR SGX has on-chip MRTs, so it should be possible to avoid the bandwidth hit you mention with some care, since that traffic needn't go out to memory. Presumably they still have the 1990s vintage deferred pixel shading as well. Kudos to Imagination Tech.

nAo
18-Aug-2007, 15:44
In an interesting combination of the two unrelated concepts mixed into this thread, PowerVR SGX has on-chip MRTs, so it should be possible to avoid the bandwidth hit you mention with some care, since that traffic needn't go out to memory. Presumably they still have the 1990s vintage deferred pixel shading as well. Kudos to Imagination Tech.
on chip MRT can help you during the geometry pass, but it really can't do much in the lighting pass

acox
18-Aug-2007, 15:51
on chip MRT can help you during the geometry pass, but it really can't do much in the lighting pass

It begs for a tile-aware extension.

Per tile:
Render scene, culling to tile, instead of whole-screen
Do lighting + misc image space ops on tiles directly in on chip mem.
Dump tile to framebuffer


Needs new software & probably hardware support but might be interesting.

SuperCow
23-Aug-2007, 08:43
on chip MRT can help you during the geometry pass, but it really can't do much in the lighting pass

Not sure how you've reached this conclusion? With on-chip MRT the G-Buffer data that needs to be fetched during the lighting passes is immediately accessible (no trip to memory) thus it will help bandwidth as well. Depending on how many lighting passes are performed I would say it may help even more than the geometry pass.

nAo
23-Aug-2007, 09:00
Not sure how you've reached this conclusion? With on-chip MRT the G-Buffer data that needs to be fetched during the lighting passes is immediately accessible (no trip to memory) thus it will help bandwidth as well. Depending on how many lighting passes are performed I would say it may help even more than the geometry pass.
It's not so easy:
1) Do PVR architectures support to use the on chip memory as a texture? it's not trivial to do.
2) What if in the lighting pass I sample pixels around my pixels? How can the chip know in advance that I'm going to need pixels that have not been rendered yet?

SuperCow
23-Aug-2007, 09:35
1) Do PVR architectures support to use the on chip memory as a texture? it's not trivial to do.

Maybe not trivial; this would probably require some form of API extension where you tell the graphic API that you want to keep a texture render target on chip and that you'll be sampling from it at a 1:1 mapping ratio. On-chip MRT would be pretty useless if you couldn't specify where and when you want to use it though.


2) What if in the lighting pass I sample pixels around my pixels? How can the chip know in advance that I'm going to need pixels that have not been rendered yet?

In Deferred Shading the lighting passes are typically performed at a 1:1 mapping ratio, i.e. shade each pixel using the properties stored in the G-Buffer for this pixel. In the post-processing passes you may want to sample pixels around though, which indeed means the texture would need to be dumped to memory at this point.

nAo
24-Aug-2007, 04:46
In Deferred Shading the lighting passes are typically performed at a 1:1 mapping ratio, i.e. shade each pixel using the properties stored in the G-Buffer for this pixel. In the post-processing passes you may want to sample pixels around though, which indeed means the texture would need to be dumped to memory at this point.
What's typical doesn't stay that way foreve; for example if I wanna do some dynamic ambient occlusion computation in screen space I need to read, per pixel, z values of the neighbour pixels.
How am I going to do that if those z values have not been computed yet?

SuperCow
24-Aug-2007, 11:14
What's typical doesn't stay that way foreve; for example if I wanna do some dynamic ambient occlusion computation in screen space I need to read, per pixel, z values of the neighbour pixels.
How am I going to do that if those z values have not been computed yet?
Sure, the case you speak of couldn't benefit from on-chip MRTs since you'd need to sample adjacent pixels from the G-Buffer. However I still think this kind of operation is atypical of lighting passes. To take your example, calculating ambient occlusion in screen space will yield incorrect results when e.g. dynamic opaque objects are rendered in front of static geometry. All of a sudden pixels in the static background will sample adjacent pixels belonging to the object in front which will obviously change the ambient occlusion calculation (which shouldn't be the case, especially if the object happens to be far from this static background). You might be able to help by looking at objects ids and such, but in all cases you'll still lose the GBuffer data of the adjacent pixels in your static background (since covered by your object in front), which means your ambient occlusion becomes variable. This is likely to be observed with a "halo effect" on the silhouette of dynamic objects.
I think we agree on the principles; on-chip MRTs is only useful if you don't start sampling adjacent pixels (which you will need to do in certain cases anyway, like post-processing or some future screen-space lighting pass effect).

Novum
24-Aug-2007, 11:21
I was always under the impression it was only deferred shadow rendering, that it wasn't a deferred renderer.
That's a myth and not possible. You need the shadows when you do the lighting calculation. You can't add shadows in another pass.

nAo
24-Aug-2007, 15:15
Sure, the case you speak of couldn't benefit from on-chip MRTs since you'd need to sample adjacent pixels from the G-Buffer. However I still think this kind of operation is atypical of lighting passes. To take your example, calculating ambient occlusion in screen space will yield incorrect results when e.g. dynamic opaque objects are rendered in front of static geometry. All of a sudden pixels in the static background will sample adjacent pixels belonging to the object in front which will obviously change the ambient occlusion calculation (which shouldn't be the case, especially if the object happens to be far from this static background). You might be able to help by looking at objects ids and such, but in all cases you'll still lose the GBuffer data of the adjacent pixels in your static background (since covered by your object in front), which means your ambient occlusion becomes variable. This is likely to be observed with a "halo effect" on the silhouette of dynamic objects.
I think we agree on the principles; on-chip MRTs is only useful if you don't start sampling adjacent pixels (which you will need to do in certain cases anyway, like post-processing or some future screen-space lighting pass effect).
Computer graphics, especially realtime computer graphics, is the art of fakng things, and believe me if I say no one is going to notice an halo in the ambient occlusion component.

nAo
24-Aug-2007, 15:26
That's a myth and not possible. You need the shadows when you do the lighting calculation. You can't add shadows in another pass.
It's not a myth and it's completely possible as many games do it.
Shadows are computed before lighting, so that when you're lighting your scene you have already available a shadowing term (typically sampled from a texture)

Chris Lux
24-Aug-2007, 15:34
It's not a myth and it's completely possible as many games do it.
Shadows are computed before lighting, so that when you're lighting your scene you have already available a shadowing term (typically sampled from a texture)
and where are then the 'deferred shadows' in this?

nAo
24-Aug-2007, 15:57
and where are then the 'deferred shadows' in this?
In a canonical renderer shadows are computed in the color pass as all the other lighting terms: so you a rendering a mesh with a single shader/pass and this shader takes care of everything.
In this deferred shadowing case shadows are computed BEFORE the lighting pass and stored in some texture(s).
It works like a deferred rendeder, but just for shadows.

Novum
24-Aug-2007, 16:36
And what's the advantage of doing so? I don't believe this. If they would do it like that, then there would be no explanation for not supporting AA under Direct3D 9.

Arnold Beckenbauer
24-Aug-2007, 16:53
And what's the advantage of doing so? I don't believe this. If they would do it like that, then there would be no explanation for not supporting AA under Direct3D 9.

http://forum.beyond3d.com/showthread.php?p=744476#post744476

nAo
24-Aug-2007, 17:14
And what's the advantage of doing so?
Decoupling shadow maps filtering from geometry complexity, working around GPU inefficiencies (GPUs work on 2x2 quads, and often quads are not completely filled with fragments to shade), reducing shaders combinatorial explosion, enabling your engine to fetch a per-pixel dynamic number of samples from the shadow map without using dynamic branching, etc..etc..

I don't believe this.I personally developed this tech for a PS3 title, so yeah, games do that.
A lot of developers discovered on their own, more or less in the same time frame, how useful this technique can be.

If they would do it like that, then there would be no explanation for not supporting AA under Direct3D 9.Of course there is, and it's related to the fact taht you can't read back subsamples of your multisampled z buffer in DX9 (and DX10.0 as well..), the only way to work around this problem on PC is to supersample your shadows..and this can be fairly slow.
On consoles we can do much better than that since we work closer to the metal

Novum
24-Aug-2007, 17:16
That's a typical statement for standard deferred shading. You have to do the lighting calculations and stuff per subsample.

I just checked Medal of Honor Airborne with forced AA by the driver and the result are typical deferred shading artifacts at edges (wrong vectors).

When I have my Bioshock copy I will do further investigations but until then I highly doubt the "only shadows are deferred" stuff. That makes no sense at all to me anyway.

Is there some paper of the technique you talk about? Seems interesting :???:

Chris Lux
24-Aug-2007, 18:53
In this deferred shadowing case shadows are computed BEFORE the lighting pass and stored in some texture(s).
It works like a deferred rendeder, but just for shadows.
so i can call it traditional shadow mapping then ;)

nAo
24-Aug-2007, 19:02
so i can call it traditional shadow mapping then ;)
No, you can't, cause I'm talking about shadow maps sampling, not shadow maps rendering.
Traditional shadow maps implementations render shadow maps first and sample shadow maps later
within the color pass, with a deferred shadowing approach there's a third phase in between
the aforementioned passes that takes care of sampling shadow maps.

Chris Lux
24-Aug-2007, 19:10
[...] a deferred shadowing approach there's a third phase in between
the aforementioned passes that takes care of sampling shadow maps.
ok, now i get the difference. now i also understand the problem with AA and this technique... thanks!

nAo
24-Aug-2007, 19:12
ok, now i get the difference. now i also understand the problem with AA and this technique... thanks!
Do I win the "best educational post - Beyond3D Awards 2007"? ;)

Andrew Lauritzen
24-Aug-2007, 19:13
I agree that doing deferred shadowing indeed has some benefits although I'm personally a proponent of fully deferred lighting, but that may be a few more years down the road...

Still, I'm interested in the particular benefits that you see:
Decoupling shadow maps filtering from geometry complexity
Do you mean just in terms of overdraw? Theoretically a pre-Z pass will accomplish the same thing (which is effectively what you're doing anyways), no?

working around GPU inefficiencies (GPUs work on 2x2 quads, and often quads are not completely filled with fragments to shade)
Fair enough, but do you actually see a real speed improvement from this? Furthermore it's precisely this step that forfeits being able to compute the shadow map coordinate derivatives and thus do "proper" filtering (unless you compute the derivatives analytically, which is actually pretty cheap in this case and maybe the way to go...).

reducing shaders combinatorial explosion
Definitely a very compelling reason, although fully deferred rendering does much more than just deferred shadowing to this end.

enabling your engine to fetch a per-pixel dynamic number of samples from the shadow map without using dynamic branching
Here I'm a bit confused... how is this accomplished? Predication? Stencil? Multiple passes to generate the screen-space shadow buffer?

Thanks in advance.

nAo
24-Aug-2007, 19:31
Do you mean just in terms of overdraw? Theoretically a pre-Z pass will accomplish the same thing (which is effectively what you're doing anyways), no?
I see this in terms of "my color pass shader is now shorter, I removed a cost that was also linked to geometric complexity in camera view".


Fair enough, but do you actually see a real speed improvement from this? Oh yea, big speed improvements! at lest on one architecture.. :)


Furthermore it's precisely this step that forfeits being able to compute the shadow map coordinate derivatives and thus do "proper" filtering (unless you compute the derivatives analytically, which is actually pretty cheap in this case and maybe the way to go...).Not having used VSM with cascaded shadow maps I obviously never had any issue as I coulnd't use any hw filtering, but I see your point here :)


Here I'm a bit confused... how is this accomplished? Predication? Stencil? Multiple passes to generate the screen-space shadow buffer?
Multiple passes + early depth bounds test -> as fast as a single pass (overhead is really minimal) but I can change shader per pass..thus I can change the number of samples (or other things..) per per shadow split shadow map.

Andrew Lauritzen
24-Aug-2007, 19:36
Not having used VSM with cascaded shadow maps I obviously never had any issue as I coulnd't use any hw filtering, but I see your point here :)
Well I'm actually starting to think that one should just compute the derivatives analytically anyways (it's just a few ALU ops), since this is necessary for shadow mapping with deferred shading as well. Furthermore even forward-rendered VSM+CSM requires either analytically computed derivatives or some pixel quad hacks (if you remember the PSVSM thread :)), so losing derivatives isn't really critical in this case.


Multiple passes + early depth bounds test -> as fast as a single pass (overhead is really minimal) but I can change shader per pass..thus I can change the number of samples (or other things..) per per shadow split shadow map.
Ah I was wondering if that's what you were doing. I'm not surprised that there's minimal overhead as early-Z pass is pretty fast nowadays... then again so is dynamic branching on "most" new graphics architectures ;)

Thanks for the info - definitely clears a few things up for me. I'm eagerly awaiting the release of your game :)

Arnold Beckenbauer
01-Sep-2007, 20:02
Why is it possible to force AA with G80 series per CP in eg Bioshock or MoH: Airborne?

CarstenS
03-Sep-2007, 15:42
Why is it possible to force AA with G80 series per CP in eg Bioshock or MoH: Airborne?

It's always possible - you just have to grab the right buffer to write to.

edit:
Of course, it's not that easy. But choosing the right buffer from the plethora floating around is one of the main reasons, it's quite difficult to force FSAA nowadays. But since at least Bioshock uses a UE3-Engine, chances are, that the correct buffer(s) have already been singled out by devtech.

Laa-Yosh
10-Sep-2007, 16:43
That's a myth and not possible. You need the shadows when you do the lighting calculation. You can't add shadows in another pass.

The movie VFX industry has been doing exactly that for ages. I think a simple multiply operation is enough in most cases.
Obviously they also have to be used as masks for the appropriate specular passes, so it requires heavy multipassing... Edit: or as nAo has mentioned, you can pre-calculate shadow maps and use them in the shader, too. Big VFX houses render out shadow maps (even for every frame if lights/objects are animated) and store them on disk, so it's also a possible option.

Reason: ability to re-use shadows for many test renders, ability to blur/transform the shadows in screen space etc.

Andrew Lauritzen
10-Sep-2007, 20:51
ability to blur/transform the shadows in screen space
Ewwww - stop encouraging "those people" ;)

nAo
10-Sep-2007, 20:59
You have no idea on how many games use that, hey, even Crysis does!

Andrew Lauritzen
10-Sep-2007, 21:55
You have no idea on how many games use that, hey, even Crysis does!
Oh I do have an idea, and IMO Crysis' shadows aren't exactly perfect... they use very high resolution shadow maps, but many of the screenshots reveal rather poor filtering. Furthermore they use a bizzare texture-space jittering scheme as well which I'm not entirely sure is a good idea... they display the results as if they are impressive, but they just look ugly to me ;)

I guess screen-space-blurred shadows aren't *that* much worse than screen-space DOF, but the artifacts are still very visible and distracting IMHO. It's too bad that people consider minor light bleeding a deal-breaker in some cases (admittedly it can get bad in some scenes) but tolerate screen-space blurs...

I believe World in Conflict uses some sort of screen-space blurring for their shadows and while it looks reasonable in screenshots it looks *terrible* ingame with moving cameras/objects/etc. I can't begin to describe the severity of the artifacts that appear (try the demo/beta... it's *very* obvious). While WiC may be worse than some implementations, there's just no real way to make it look good.

Anyways, that sort of extremely hacked "solution" is a particular pet-peeve of mine... maybe I don't want to work in the games industry after all ;)

Laa-Yosh
10-Sep-2007, 22:01
The thing with screen space blur is that it has a mask, and that's a simple gradient that's tracked to the real shadow in a compositing app, and so its strenght is used to simulate real soft shadows - the further away from the shadow caster, the more the shadow gets blurred. Now that's a bit harder to reproduce in a realtime 3D enviroment, though I think it might be possible...

Andrew Lauritzen
10-Sep-2007, 22:14
The thing with screen space blur is that it has a mask, and that's a simple gradient that's tracked to the real shadow in a compositing app, and so its strenght is used to simulate real soft shadows - the further away from the shadow caster, the more the shadow gets blurred.
Regardless of how the blur is being applied it should be done in *light space*, not screen space. The latter is just so terribly wrong...

nAo
10-Sep-2007, 22:23
With a mask it ends up working exactly as a DOF effect in image space, I guess we need some fast bileteral filtering implementations then :)

Andrew Lauritzen
11-Sep-2007, 01:49
With a mask it ends up working exactly as a DOF effect in image space, I guess we need some fast bileteral filtering implementations then :)
Right, and doing DOF in image space is also pretty wrong, but generally less objectionable IMO since at least in that case you're blurring in something close to the proper axes. With shadows not only do you have haloing problems at edges (even with fancier filtering), but your blur isn't related to the proper geometric arrangement of the light/occluder/receiver at all! Indeed moving the camera around will have a noticeable warping effect on the so-called "penumbra" of the shadow. It's wrong to the point that I wouldn't even bother scaling filter widths based on occluder/receiver ratios because what you get isn't even "plausible", let alone physically correct.

The fundamental problem is that the occlusion that is relevant for soft shadows comes from the light's point of view, not the camera's. So while DOF remains plausible as long as the blurs aren't too large (they're a reasonable approximation to what you can see anyways), the same is not true for lights. Moreover it's not just the distance ratios that are relevant (which can be "masked" certainly from the original geometric data), it's the actual geometric projection in light space.

I dunno, perhaps I'm being too hard on the technique, but the fact that "so many games use it", regardless of the existence of much more correct (and comparatively cheap) algorithms, makes me depressed and less motivated to continue researching as it probably won't be used anyways... :(

Frank
22-Oct-2007, 02:24
Well, for shadows you could set all surfaces to black, and treat the calculated color value as the light modifier. For soft borders and AA you would want an edge detect modifier for that as well.

Betanumerical
22-Oct-2007, 10:54
I have only read the first and last page, but I can't believe this hasn't been discussed more. How are GG achieving AA with DR in KZ2?.

Arnold Beckenbauer
22-Oct-2007, 13:20
I have only read the first and last page, but I can't believe this hasn't been discussed more. How are GG achieving AA with DR in KZ2?.

Start here: http://forum.beyond3d.com/showthread.php?p=1039398#post1039398

Galduta
31-Dec-2007, 18:29
The article about deferred shading and the engine of STALKER , by Oles Shishkovtsov in GPU Gems 2 http://www.gsc-game.com/images/_icons/em_smile_32.gif , in the blog of Oles .

http://oles-rants.blogspot.com/

http://www.4a-games.com/209_gems2_ch09.pdf

9.7 Conclusion
Deferred shading, although not appropriate for every game, proved to be a great rendering
architecture for accomplishing our goals in S.T.A.L.K.E.R. It gave us a rendering
engine that leverages modern GPUs and has lower geometry-processing requirements,
lower pixel-processing requirements, and lower CPU overhead than a traditional forward
shading architecture. And it has cleaner and simpler scene management to boot.
Once we worked around the deficiencies inherent in a deferred shader, such as a potentially
restricted material system and the lack of antialiasing, the resulting architecture
was both flexible and fast, allowing for a wide range of effects. See Figure 9-8 for an
example. Of course, the proof is in the implementation. In S.T.A.L.K.E.R., our original
forward shading system, despite using significantly less complex and interesting
shaders, actually ran slower than our final deferred shading system in complex scenes
with a large number of dynamic lights. Such scenes are, of course, exactly the kind in
which you need the most performance!The new engine with deferred rendering of Oles Shishkovtsov for METRO 2033

http://images.stage6.com/video_images/2012599t.jpg (http://www.stage6.com/user/Oritxupolitena/video/2012599/METRO-2033-engine)METRO 2033 engine (http://www.stage6.com/user/Oritxupolitena/video/2012599/METRO-2033-engine)

http://i4.photobucket.com/albums/y117/jonelo/capturas/uyityityi.jpg


what is deferred supersampling ?

Frank
03-Jan-2008, 21:04
You render the scene multiple times, each with a slight, sub-pixel offset, and average.

Novum
03-Jan-2008, 21:17
64x? That would be pretty expensive ;)

Frank
03-Jan-2008, 21:22
64x? That would be pretty expensive ;)
Well, it's probably rendered 4 times at 4 times the resolution in each direction, and scaled down.

But yes, still quite expensive.

Ilfirin
03-Jan-2008, 21:51
In a number of engineering programs I've written I've implemented super sampling by just rendering the screen to a higher resolution (multiplies of 2 x width by 2 x height) back-buffer than was going to be displayed and, after rendering was complete, progressively down-sampling the target by 1/4x each time until it was the same size as the back buffer. Not sure what they meant by "deferred super sampling" but I think this .. somewhat.. fits the description.


Not the most elegant solution in the world, but it was actually surprisingly efficient both in terms of development and in execution. The whole process only required adding a couple lines of code and about 5 minutes of development, and you could drop this into pretty much any application the same way. As far as execution goes - you only have to draw the scene once and it works on all hardware with bilinear filtering (even DX7 and below cards), after that the downsampling is extremely cheap. Most of the graphics cards at the company were around Geforce2MX level, but were all still able to run everything at 64-256+ samples and still stay within the range of acceptable framerates for engineering concerns.
I still use this technique in any application that has small render windows (like, for instance, the level editor I use for my side projects) as I'd much rather be able to see a clear, crisp image of what I'm working with at 60fps than a jagged, unusable image at 400fps.

Galduta
04-Jan-2008, 02:53
A few months ago that I did not prove Stalker. This is with the the Nhancer and the compatibility for HDR +AA of Obliivion . I had proven this method in August/September with the demo of Medal of Honor, Airborne but not worked in Stalker, neither MSAA nor SSAA nor no method. Later I have used it in also in the U3 and GOW in DX9.

MSAA 4X

http://i4.photobucket.com/albums/y117/jonelo/capturas%202/mhjgkjhg.jpg

NO AA

http://i4.photobucket.com/albums/y117/jonelo/capturas%202/pytit-copia.jpg

MSAA 4X

http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0400-57-12-72.jpg

NO AA

http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0400-56-25-99.jpg


MSAA 2X 1080P

http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0405-23-54-45.jpg

Galduta
06-Jan-2008, 18:34
The latest forceware has suport for the MSAA in Stalker and deferred shading , from the NV Control Panel


My drivers are the 169.25 , but found since the 169,xx. Simply is the AA of control panel , override any application seting .



http://forum.beyond3d.com/showpost.php?p=1113108&postcount=558


Extend with one click after opening the link , MSAA 2X 1080p

http://i4.photobucket.com/albums/y117/jonelo/capturas 2/XR_3DA2008-01-0601-46-56-38.jpg (http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0601-46-56-38.jpg)

http://i4.photobucket.com/albums/y117/jonelo/capturas 2/XR_3DA2008-01-0509-22-59-33.jpg (http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0509-22-59-33.jpg)


http://i4.photobucket.com/albums/y117/jonelo/capturas 2/XR_3DA2008-01-0509-22-24-03.jpg (http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0509-22-24-03.jpg)


http://i4.photobucket.com/albums/y117/jonelo/capturas 2/XR_3DA2008-01-0509-22-25-19.jpg (http://i4.photobucket.com/albums/y117/jonelo/capturas%202/XR_3DA2008-01-0509-22-25-19.jpg)

Galduta
27-Jan-2008, 04:14
"Deferred anti-aliasing (AA)" (http://oles-rants.blogspot.com/2008/01/deferred-anti-aliasing-aa.html) by Oles Shishkovtsov

in the engine of Metro 2033

Chris Lux
20-Feb-2008, 22:01
hi again,
just to clear some things up for me:
so if i understand this correctly you do your shadow map creation per light as before. but then in a second pass you render this shadow map into a screen size map so that you have the shadow information per pixel precalculated. this is done with a prerendered depth texture (depth to light space transformation and shadow compare).

you can pack 4 lights shadows like this into one texture. do i understand this correctly?