Crysis does something similar as they compute dynamic amblient occlusion in screen space using per pixel depthThere are games out there that does 'contact shadows'.
Crysis does something similar as they compute dynamic amblient occlusion in screen space using per pixel depthThere are games out there that does 'contact shadows'.
That would be really nice.. hey, are you working for Axis, aren't you?To me it seems that it's the result of Guerilla taking a long, thorough look at the compositing pipeline at Axis Animation, about what they've done to their raw renders for the E32005 trailer. Color correction can go a long way in the hands of some good artists, and it helps to get the look of separately textured assets more consistent. You can also use it to simulate fog and atmosphere.
I can check out what our compositors are doing in more detail on monday if you're interested but unfortunately no before/after images...
That would be really nice.. hey, are you working for Axis, aren't you?
Did anyone notice on page 33 that it uses Phong Lighting I thought that was pretty high level graphical stuff!!
Did anyone notice on page 33 that it uses Phong Lighting I thought that was pretty high level graphical stuff!!
Also they use deferred and forward rendering to create the scene!!
Go to Page #25.
yep, one 32bit z buffer and four RGBA8 color buffers, times 2, as they have 2x multisampling.
That's 40 bytes per pixel -> 36 megs at 720p resolution
Doom3 used Phong lighting too. It's the very basic shading model in computer graphics; there are more advanced ones, like Blinn, anisotropic, Cook-Torrance, and so on. Phong is usually associated with a plastic look for its simple specular term.
Phong was a scientist from Vietnam who invented this shader, by the way
I thought it was bump mapping that made give things a placstic look?
Obviously got my wires crossed the. (regarding bump mapping). I unhappily believed that it was the bump mapping in the UT engine that made GOW and UT have that platic look. I stand corrected thanks.
Given this point of view, why are you such a huge proponent of DR? The number one thing everyone mentions with DR (even smart guys like DemoCoder) is the reduced number of pixels for lighting. You and I both believe that this is negligible (i.e. k from my last post is under, say, 1.2) so you must think that there are other huge benefits, right?I totally agree with you on this. I usually add, on top of good scene sorting, some kind of heuristic to choose few very large occluders to draw in the automatic zpass on a frame by frame basis: the cost of this is negligeable and you can easily get pretty close to one fragment shaded per pixel.
I'd say this is rarely a problem in a decent forward renderer.
Which means in reality my formulas are underestimating the work that DR has. Still, I think you can do a lot of the lighting math with just the normal map, and while doing that you can start to load all the other inputs.That is actually pretty hard to achieve considering you need to read g-buffer (or at least most of it) first and then be able to do anything in lighting. KZ guys seem to spend quite some time in reconstruction of normal and position and this + good cache coherency when reading from g-buffer should help to hide most of the latency.
That's a good point that I never thought about before. However, how big are the local shadowmaps going to be? 3MB each for 512x512 cube maps? The G-buffer cost is going to negate any savings you get from that.I see the bandwidth going lower only when you go with single pass forward rendering (again, problems with keeping shadow maps for all lights in memory).
Well..if you have a lot of lights and you want all of them to cast shadows that memory cost can go up quite dramatically.That's a good point that I never thought about before. However, how big are the local shadowmaps going to be? 3MB each for 512x512 cube maps? The G-buffer cost is going to negate any savings you get from that.
Yup, and looking at the presentation they do just that. I'm more interested in you opinion on how dynamic branching can avoid the need for that for situations beyond KZ2 on PS3. I think on R5xx/R6xx/G8x you should be able to use DB with FR to avoid lighting unnecessary pixels and get similar efficiency to DR.When I speak about DR I take this for given..and I'm sure KZ2 does that..
Stencil masking means more than one pixels worth of work on edges (IMO we really should aim for 4xAA). That's rather similar to the inefficiencies caused by quad-based rasterization on polygon edges. Also, remember that DR will have the same inefficiencies as FR when filling the G-buffer.Another thing that DR gets right is pixel shaders utilization, as working on (decently size) rectangular areas + stencil masking let your work mostly on fully covered 2x2 tiles.
I think you may be overestimating the amount of work in the lighting pass. The shadow map samples are already split between the MSAA subsamples in KZ2 (which IMO is a great solution that Epic should look into for UE3). Math would be fairly cheap considering how much data needs to be loaded per pixel.btw..hasn't anyone noticed that according the presentation they are actually doing supersampling in the lighting pass? not going to give any particular advantage as sub samples will mostly have the same values per pixel but if they implement a more clever scheme for the lighting pass they will likely see big speed improvements.
True, but you'd need a lot of lights. KZ2, according to the presentation, uses 2MB per non-sun shadow map, so you'd have to have 11 shadowed lights before you get a memory gain.Well..if you have a lot of lights and you want all of them to cast shadows that memory cost can go up quite dramatically.
I don't buy the render-state stuff. If you are using DR, then you're using the same lighting model on all objects, meaning you don't have to change renderstate any less than with FR unless you must use a different shader for each number of lights. There's really only one material for all opaque objects, but it has a couple parameters. If you want really good material variety (e.g. fur, anisotropic lighting, antialiased parallax mapping, etc.) then DR either falls flat on its face or degenerates to FR with extra overhead.I think the key point Guerrilla makes is that for complex rendering, loaded to the gills with materials, lights and post-processing, DR is the way to go. Even with RSX's lowered render-state overheads (compared with DX)and assistance of Cell, there would appear to be a complexity ceiling for FR, for the kind of visuals they want to deliver.
Jawed
I don't know, it's not easy to give an answer from a purely theoretical point of view, to be honest if I were working right now on a next gen engine I'd try both approaches.Yup, and looking at the presentation they do just that. I'm more interested in you opinion on how dynamic branching can avoid the need for that for situations beyond KZ2 on PS3. I think on R5xx/R6xx/G8x you should be able to use DB with FR to avoid lighting unnecessary pixels and get similar efficiency to DR.
I thought about this but stencil masking is going to introduce far simpler edges/slihouttes than real world meshes rendering, efficiency should increase accordingly, especially for stuff that uses a lot of triangles.Stencil masking means more than one pixels worth of work on edges (IMO we really should aim for 4xAA). That's rather similar to the inefficiencies caused by quad-based rasterization on polygon edges.
yeah..and no, inefficiences are also a function of your shader length and in the geometry pass your average shader length is going to be shorter on DR than on FR, though at the same you have to pay some overhead in a DR to fetch and decode your geometry buffer in the lighting pass (and this cost is per light!!!)Also, remember that DR will have the same inefficiencies as FR when filling the G-buffer/
Well..I'm not saying that supersampling is going to aumatocally make the lighting pass run at half speed, but I would expect a 30%-40% impact. This could mean 10%-15% performance loss over an entire frame, it would be nice to get at least part of that backI think you may be overestimating the amount of work in the lighting pass.
I agree on this, they had a brilliant idea! but remember that per each z subsample you have to compute its projection to N shadow maps and select with a dot product the right shadow map for your pixel and this is a constant cost that can't be split over the 2 samples.The shadow map samples are already split between the MSAA subsamples in KZ2 (which IMO is a great solution that Epic should look into for UE3). Math would be fairly cheap considering how much data needs to be loaded per pixel.
Way less than 1% of a 30 fps frame..( Yes I pulled this number out of my ass but trust me on this ```0Remember that the stencil masking step isn't free
Not only would be insufficient, it would be completely wrong, as depth is supersampled!A depth buffer comparison between samples would be insufficient, so you probably have to fetch the normal map too.
I think 10%-15%, not more than that.For the 2xAA in KZ2, I don't think you'll get much more than 30% reduction in rendering time for just for the lighting portion, and it could be much less.
I agree. I'm also thinking about a DR working with more than 2 samples per pixel.In the context of the whole rendering process -- G-buffer, shadow maps, lighting, post-processing -- it's probably not that significant.
Umh..I don't agree with that, I don't think this is the main reason, (hierarchical) stencil rejection is going to outperform DB for the foreasable future, when available..Looking at everything, it seems like this factor, along with RSX's poor DB for skipping lights with FR, was the primary reason that GG went for DR in KZ2
I wouldn't be suprised if games will more and more go towards a mixed approach.I have some ideas for SH neighbourhood transfer that need partial DR as well, but I still think that for most non-exotic rendering techniques FR is the way to go.
Hmm, can you explain this a bit more? I would think that in the use that I'm talking about -- simply skipping lighting code for pixels that are too far away -- wouldn't be an issue here.DB granularity is pretty good on some GPUs, but we should also remember that everytime we introduce a dynamic branch we also insert a barrier in the code that limits registers re-utilization, so that a high number of branches can affect performances even in a perfectly braching coherency scenario.
Remember that the early stencil reject that going to use this make works on a 2x2 basis too. The mask may be simpler if you used some heuristic for edge detection, but if you used a compressed sample flag provided by the GPU, then the pixels marked for supersampling during the deferred lighting pass would match those inefficient FR pixels you're talking about.I thought about this but stencil masking is going to introduce far simpler edges/slihouttes than real world meshes rendering, efficiency should increase accordingly, especially for stuff that uses a lot of triangles.
Do you remember when I wrote (probably over a year ago) that going deferred with shadowing (this is what we did on HS) was a huge win, for the same reason we're debating now, and you were not that sure. Now it seems that every (Crysis for example..) game around is using the same technique as many devs, indipendently, reached the same conclusion.
Possibly, but I'm just comparing it to cost of loading the supersampled G-buffer in the first place. If you need 8 bytes of access to make the stencil, then using it to selectively load 16 bytes instead of 32 may not save much time. Maybe there's so much math that this line of reasoning of mine is irrelevant, though.Way less than 1% of a 30 fps frame..( Yes I pulled this number out of my ass but trust me on this ```0
Yeah, I was just talking about the benefits for KZ2, as per your post. With higher levels of AA I agree that you don't want to do DR without a mask. All the other stuff I'm debating with you about is about DR+MSAA+mask vs. FR+MSAA.I agree. I'm also thinking about a DR working with more than 2 samples per pixel.
Super-high rejection speeds are overrated. If lit pixels take 20 shader cycles per light, it doesn't really matter if your skipped code takes 1 cycle via DB or the equivalent of 0.01 cycles via stencil reject. I don't believe that there is any inherent need to submit geometry in huge chunks, so fine grained pieces will have number of lights significantly limited by the CPU, and I also don't think that after that stage we'll see pixels averaging 10 skipped lights for each processed one.Umh..I don't agree with that, I don't think this is the main reason, (hierarchical) stencil rejection is going to outperform DB for the foreasable future, when available..
If you skip the whole computation than it's not an issue, but since you were talking about a FR I thought you wanted to handle more than a light in one pass, in this case you could skip a light, but you could still need to handle the remaning lights.Hmm, can you explain this a bit more? I would think that in the use that I'm talking about -- simply skipping lighting code for pixels that are too far away -- wouldn't be an issue here.
Right, that's why I only predict a not too shabby improvement of 10%-15% in performanceThe mask may be simpler if you used some heuristic for edge detection, but if you used a compressed sample flag provided by the GPU, then the pixels marked for supersampling during the deferred lighting pass would match those inefficient FR pixels you're talking about.
VSM is already being used for a lot of titles and more and more will use it, even Crysis uses VSM. Hopefully some other technique will also be used in the future..Yup, I remember the discussion we had. If VSM picks up, though, then per pixel cost will go down dramatically to make this advantage go away.
No, it's completely different, but unfortunately I can't explain how it works (a bit too close to the metal)By the way, how did you handle multisampling with deferred shadows in HS? The same way as in KZ2?
Umh, how can DB be so cheap? let say you have a light in a room and you can see some lit pixels (a very thing and long strip of pixels) only through an almost closed door, with DB an a decent granularit (say 64 pixels) you'd probably run your shader over tens of thousands pixels, with a stencil mask you might end up shading an order of magnitude less pixels.Super-high rejection speeds are overrated. If lit pixels take 20 shader cycles per light, it doesn't really matter if your skipped code takes 1 cycle via DB or the equivalent of 0.01 cycles via stencil reject.