Killzone 2 technology discussion thread (renamed)

Status
Not open for further replies.
To me it seems that it's the result of Guerilla taking a long, thorough look at the compositing pipeline at Axis Animation, about what they've done to their raw renders for the E32005 trailer. Color correction can go a long way in the hands of some good artists, and it helps to get the look of separately textured assets more consistent. You can also use it to simulate fog and atmosphere.
I can check out what our compositors are doing in more detail on monday if you're interested ;) but unfortunately no before/after images...
That would be really nice.. hey, are you working for Axis, aren't you? :)
 
Did anyone notice on page 33 that it uses Phong Lighting I thought that was pretty high level graphical stuff!!

Also they use deferred and forward rendering to create the scene!!
 
Did anyone notice on page 33 that it uses Phong Lighting I thought that was pretty high level graphical stuff!!

Doom3 used Phong lighting too. It's the very basic shading model in computer graphics; there are more advanced ones, like Blinn, anisotropic, Cook-Torrance, and so on. Phong is usually associated with a plastic look for its simple specular term.

Phong was a scientist from Vietnam who invented this shader, by the way ;)
 
Did anyone notice on page 33 that it uses Phong Lighting I thought that was pretty high level graphical stuff!!

Also they use deferred and forward rendering to create the scene!!

DR and (half) transparent geometry don't like eachother (#40).
Some may enjoy this doc on deferred rendering:

http://www.talula.demon.co.uk/DeferredShading.pdf
Go to Page #25.

yep, one 32bit z buffer and four RGBA8 color buffers, times 2, as they have 2x multisampling.
That's 40 bytes per pixel -> 36 megs at 720p resolution

For example: STALKER's G-Buffer has 3 textures (A16B16G16R16F).
 
Doom3 used Phong lighting too. It's the very basic shading model in computer graphics; there are more advanced ones, like Blinn, anisotropic, Cook-Torrance, and so on. Phong is usually associated with a plastic look for its simple specular term.

Phong was a scientist from Vietnam who invented this shader, by the way ;)

I thought it was bump mapping that made give things a placstic look?
 
Obviously got my wires crossed the. (regarding bump mapping). I unhappily believed that it was the bump mapping in the UT engine that made GOW and UT have that platic look. I stand corrected thanks.
 
I totally agree with you on this. I usually add, on top of good scene sorting, some kind of heuristic to choose few very large occluders to draw in the automatic zpass on a frame by frame basis: the cost of this is negligeable and you can easily get pretty close to one fragment shaded per pixel.

I'd say this is rarely a problem in a decent forward renderer.
Given this point of view, why are you such a huge proponent of DR? The number one thing everyone mentions with DR (even smart guys like DemoCoder) is the reduced number of pixels for lighting. You and I both believe that this is negligible (i.e. k from my last post is under, say, 1.2) so you must think that there are other huge benefits, right?

That is actually pretty hard to achieve considering you need to read g-buffer (or at least most of it) first and then be able to do anything in lighting. KZ guys seem to spend quite some time in reconstruction of normal and position and this + good cache coherency when reading from g-buffer should help to hide most of the latency.
Which means in reality my formulas are underestimating the work that DR has. Still, I think you can do a lot of the lighting math with just the normal map, and while doing that you can start to load all the other inputs.

I see the bandwidth going lower only when you go with single pass forward rendering (again, problems with keeping shadow maps for all lights in memory).
That's a good point that I never thought about before. However, how big are the local shadowmaps going to be? 3MB each for 512x512 cube maps? The G-buffer cost is going to negate any savings you get from that.
 
That's a good point that I never thought about before. However, how big are the local shadowmaps going to be? 3MB each for 512x512 cube maps? The G-buffer cost is going to negate any savings you get from that.
Well..if you have a lot of lights and you want all of them to cast shadows that memory cost can go up quite dramatically.
 
When I speak about DR I take this for given..and I'm sure KZ2 does that..
Yup, and looking at the presentation they do just that. I'm more interested in you opinion on how dynamic branching can avoid the need for that for situations beyond KZ2 on PS3. I think on R5xx/R6xx/G8x you should be able to use DB with FR to avoid lighting unnecessary pixels and get similar efficiency to DR.

Another thing that DR gets right is pixel shaders utilization, as working on (decently size) rectangular areas + stencil masking let your work mostly on fully covered 2x2 tiles.
Stencil masking means more than one pixels worth of work on edges (IMO we really should aim for 4xAA). That's rather similar to the inefficiencies caused by quad-based rasterization on polygon edges. Also, remember that DR will have the same inefficiencies as FR when filling the G-buffer.

I'm not convinced that this is as big of an advantage for DR as you think it is (or how much I think you think it is ;)).
btw..hasn't anyone noticed that according the presentation they are actually doing supersampling in the lighting pass? not going to give any particular advantage as sub samples will mostly have the same values per pixel but if they implement a more clever scheme for the lighting pass they will likely see big speed improvements.
I think you may be overestimating the amount of work in the lighting pass. The shadow map samples are already split between the MSAA subsamples in KZ2 (which IMO is a great solution that Epic should look into for UE3). Math would be fairly cheap considering how much data needs to be loaded per pixel.

Remember that the stencil masking step isn't free. A depth buffer comparison between samples would be insufficient, so you probably have to fetch the normal map too. For the 2xAA in KZ2, I don't think you'll get much more than 30% reduction in rendering time for just for the lighting portion, and it could be much less. In the context of the whole rendering process -- G-buffer, shadow maps, lighting, post-processing -- it's probably not that significant.

Well..if you have a lot of lights and you want all of them to cast shadows that memory cost can go up quite dramatically.
True, but you'd need a lot of lights. KZ2, according to the presentation, uses 2MB per non-sun shadow map, so you'd have to have 11 shadowed lights before you get a memory gain.

Looking at everything, it seems like this factor, along with RSX's poor DB for skipping lights with FR, was the primary reason that GG went for DR in KZ2. I have some ideas for SH neighbourhood transfer that need partial DR as well, but I still think that for most non-exotic rendering techniques FR is the way to go.
 
I think the key point Guerrilla makes is that for complex rendering, loaded to the gills with materials, lights and post-processing, DR is the way to go. Even with RSX's lowered render-state overheads (compared with DX)and assistance of Cell, there would appear to be a complexity ceiling for FR, for the kind of visuals they want to deliver.

Jawed
 
I think the key point Guerrilla makes is that for complex rendering, loaded to the gills with materials, lights and post-processing, DR is the way to go. Even with RSX's lowered render-state overheads (compared with DX)and assistance of Cell, there would appear to be a complexity ceiling for FR, for the kind of visuals they want to deliver.

Jawed
I don't buy the render-state stuff. If you are using DR, then you're using the same lighting model on all objects, meaning you don't have to change renderstate any less than with FR unless you must use a different shader for each number of lights. There's really only one material for all opaque objects, but it has a couple parameters. If you want really good material variety (e.g. fur, anisotropic lighting, antialiased parallax mapping, etc.) then DR either falls flat on its face or degenerates to FR with extra overhead.

IMO it's only the need for many local lights coupled with poor dynamic branching that makes DR look attractive. It's pretty coherent branching too, so even RSX may be fine with it.
 
Yup, and looking at the presentation they do just that. I'm more interested in you opinion on how dynamic branching can avoid the need for that for situations beyond KZ2 on PS3. I think on R5xx/R6xx/G8x you should be able to use DB with FR to avoid lighting unnecessary pixels and get similar efficiency to DR.
I don't know, it's not easy to give an answer from a purely theoretical point of view, to be honest if I were working right now on a next gen engine I'd try both approaches.
DB granularity is pretty good on some GPUs, but we should also remember that everytime we introduce a dynamic branch we also insert a barrier in the code that limits registers re-utilization, so that a high number of branches can affect performances even in a perfectly braching coherency scenario.

Stencil masking means more than one pixels worth of work on edges (IMO we really should aim for 4xAA). That's rather similar to the inefficiencies caused by quad-based rasterization on polygon edges.
I thought about this but stencil masking is going to introduce far simpler edges/slihouttes than real world meshes rendering, efficiency should increase accordingly, especially for stuff that uses a lot of triangles.
Do you remember when I wrote (probably over a year ago) that going deferred with shadowing (this is what we did on HS) was a huge win, for the same reason we're debating now, and you were not that sure. Now it seems that every (Crysis for example..) game around is using the same technique as many devs, indipendently, reached the same conclusion.
Also, remember that DR will have the same inefficiencies as FR when filling the G-buffer/
yeah..and no, inefficiences are also a function of your shader length and in the geometry pass your average shader length is going to be shorter on DR than on FR, though at the same you have to pay some overhead in a DR to fetch and decode your geometry buffer in the lighting pass (and this cost is per light!!!)

I think you may be overestimating the amount of work in the lighting pass.
Well..I'm not saying that supersampling is going to aumatocally make the lighting pass run at half speed, but I would expect a 30%-40% impact. This could mean 10%-15% performance loss over an entire frame, it would be nice to get at least part of that back :)

The shadow map samples are already split between the MSAA subsamples in KZ2 (which IMO is a great solution that Epic should look into for UE3). Math would be fairly cheap considering how much data needs to be loaded per pixel.
I agree on this, they had a brilliant idea! but remember that per each z subsample you have to compute its projection to N shadow maps and select with a dot product the right shadow map for your pixel and this is a constant cost that can't be split over the 2 samples.
Though there are games around (cough..HS..cough) that can dynamically select the right shadow map and the number of samples per pixel without using dynamic branching..;)
(and I have to thank Carmack for that..god bless him for all the cool features he convinced nvidia to put in its GPUs..)

Remember that the stencil masking step isn't free
Way less than 1% of a 30 fps frame..( Yes I pulled this number out of my ass but trust me on this :) ```0

A depth buffer comparison between samples would be insufficient, so you probably have to fetch the normal map too.
Not only would be insufficient, it would be completely wrong, as depth is supersampled!
As I wrote in my post I would check the albed, but also the normal is fine.
Keep in mind that even checking all the geometry buffer attributes wouldn't give us the correct answer. I'd really like to be able to ask to the GPU if all the subsamples within a pixel are 'compressed' or not ;)
For the 2xAA in KZ2, I don't think you'll get much more than 30% reduction in rendering time for just for the lighting portion, and it could be much less.
I think 10%-15%, not more than that.

In the context of the whole rendering process -- G-buffer, shadow maps, lighting, post-processing -- it's probably not that significant.
I agree. I'm also thinking about a DR working with more than 2 samples per pixel.

Looking at everything, it seems like this factor, along with RSX's poor DB for skipping lights with FR, was the primary reason that GG went for DR in KZ2
Umh..I don't agree with that, I don't think this is the main reason, (hierarchical) stencil rejection is going to outperform DB for the foreasable future, when available..

I have some ideas for SH neighbourhood transfer that need partial DR as well, but I still think that for most non-exotic rendering techniques FR is the way to go.
I wouldn't be suprised if games will more and more go towards a mixed approach.
Again, look at what many things Crysis do just reusing a per pixel depth value (shadows, dynamic ambient occlusion, fog/atmospheric scattering, post processing effects..)

Marco
 
DB granularity is pretty good on some GPUs, but we should also remember that everytime we introduce a dynamic branch we also insert a barrier in the code that limits registers re-utilization, so that a high number of branches can affect performances even in a perfectly braching coherency scenario.
Hmm, can you explain this a bit more? I would think that in the use that I'm talking about -- simply skipping lighting code for pixels that are too far away -- wouldn't be an issue here.

I thought about this but stencil masking is going to introduce far simpler edges/slihouttes than real world meshes rendering, efficiency should increase accordingly, especially for stuff that uses a lot of triangles.
Do you remember when I wrote (probably over a year ago) that going deferred with shadowing (this is what we did on HS) was a huge win, for the same reason we're debating now, and you were not that sure. Now it seems that every (Crysis for example..) game around is using the same technique as many devs, indipendently, reached the same conclusion.
Remember that the early stencil reject that going to use this make works on a 2x2 basis too. The mask may be simpler if you used some heuristic for edge detection, but if you used a compressed sample flag provided by the GPU, then the pixels marked for supersampling during the deferred lighting pass would match those inefficient FR pixels you're talking about.

Yup, I remember the discussion we had. If VSM picks up, though, then per pixel cost will go down dramatically to make this advantage go away.

By the way, how did you handle multisampling with deferred shadows in HS? The same way as in KZ2?

Way less than 1% of a 30 fps frame..( Yes I pulled this number out of my ass but trust me on this :) ```0
Possibly, but I'm just comparing it to cost of loading the supersampled G-buffer in the first place. If you need 8 bytes of access to make the stencil, then using it to selectively load 16 bytes instead of 32 may not save much time. Maybe there's so much math that this line of reasoning of mine is irrelevant, though.

I agree. I'm also thinking about a DR working with more than 2 samples per pixel.
Yeah, I was just talking about the benefits for KZ2, as per your post. With higher levels of AA I agree that you don't want to do DR without a mask. All the other stuff I'm debating with you about is about DR+MSAA+mask vs. FR+MSAA.

Umh..I don't agree with that, I don't think this is the main reason, (hierarchical) stencil rejection is going to outperform DB for the foreasable future, when available..
Super-high rejection speeds are overrated. If lit pixels take 20 shader cycles per light, it doesn't really matter if your skipped code takes 1 cycle via DB or the equivalent of 0.01 cycles via stencil reject. I don't believe that there is any inherent need to submit geometry in huge chunks, so fine grained pieces will have number of lights significantly limited by the CPU, and I also don't think that after that stage we'll see pixels averaging 10 skipped lights for each processed one.
 
Hmm, can you explain this a bit more? I would think that in the use that I'm talking about -- simply skipping lighting code for pixels that are too far away -- wouldn't be an issue here.
If you skip the whole computation than it's not an issue, but since you were talking about a FR I thought you wanted to handle more than a light in one pass, in this case you could skip a light, but you could still need to handle the remaning lights.

The mask may be simpler if you used some heuristic for edge detection, but if you used a compressed sample flag provided by the GPU, then the pixels marked for supersampling during the deferred lighting pass would match those inefficient FR pixels you're talking about.
Right, that's why I only predict a not too shabby improvement of 10%-15% in performance :)

Yup, I remember the discussion we had. If VSM picks up, though, then per pixel cost will go down dramatically to make this advantage go away.
VSM is already being used for a lot of titles and more and more will use it, even Crysis uses VSM. Hopefully some other technique will also be used in the future.. :)

By the way, how did you handle multisampling with deferred shadows in HS? The same way as in KZ2?
No, it's completely different, but unfortunately I can't explain how it works (a bit too close to the metal)

Super-high rejection speeds are overrated. If lit pixels take 20 shader cycles per light, it doesn't really matter if your skipped code takes 1 cycle via DB or the equivalent of 0.01 cycles via stencil reject.
Umh, how can DB be so cheap? let say you have a light in a room and you can see some lit pixels (a very thing and long strip of pixels) only through an almost closed door, with DB an a decent granularit (say 64 pixels) you'd probably run your shader over tens of thousands pixels, with a stencil mask you might end up shading an order of magnitude less pixels.
Now this is a worst case scenario but if you have A LOT of lights on screen (KZ has easily over 100 lights per frame..) in the end it adds up. With stencil masks you don't have to worry about DB granularity, it just works very very fast on every GPU out there.
All imho of course ;)
 
Status
Not open for further replies.
Back
Top