Killzone 2 technology discussion thread (renamed)

Status
Not open for further replies.
Deferred rendering is completely orthogonal to normal mapping, so I don't see why it should limit the amount of normal maps used.
Deferred renderers tend to use more memory, so they might end up having less memory available to textures.

So it's not completely orthogonal.
You have to use more render targets -> limits texture usage.
You have to use more pixel shaders -> limits other pixel shader usage.
You have SPUs to deal with geometry/normals -> you can do something close to NM on SPU.
 
So it's not completely orthogonal.
You have to use more render targets -> limits texture usage.
You have to use more pixel shaders -> limits other pixel shader usage.
You have SPUs to deal with geometry/normals -> you can do something close to NM on SPU.

No, he's correct, it's completely orthogonal. More, it's pretty elegant, because for a deferred renderer it doesn't really matter if you are writing to the G-buffer an interpolated vertex normal or a normal coming from a normal map; the lighting passes don't care about this difference.
You are probably simply using a different definition of "orthogonal". The trade-offs you are mentioning are more or less common to any kind of renderer and mainly depend on the scenes you are rendering.
 
No, he's correct, it's completely orthogonal. More, it's pretty elegant, because for a deferred renderer it doesn't really matter if you are writing to the G-buffer an interpolated vertex normal or a normal coming from a normal map; the lighting passes don't care about this difference.

But in deferred you'll need one more pass at the final blend. And the shader in final blend should be very complex to make all the effects from same data. Not forgetting that we speak about RSX which is not so fast with complex shaders.

You are probably simply using a different definition of "orthogonal". The trade-offs you are mentioning are more or less common to any kind of renderer and mainly depend on the scenes you are rendering.

Maybe.
 
I think the point nAo and others are making is that artists have to get used to marginal returns. When you've got n^2 or n^3 scaling factors, and you double the poly budget, you increase the workload by 4x-8x. When you're going from 400 vertices to 800 vertices, this may be a vast improvement worthy of inclusion. When you're talking about going from a normal mapped 10kvert model to a 20k model, the visible change to the end user during gameplay may be nearly invisible, yet tremendous amounts of machine resources are sacrificed for little to no improvement.

Programmers are operating within constrained budgets as well. That's why they are trying to do "smarter pixels" and "smarter vertices", but only doing absolutely the work that is neccessary to have a significant visible difference to the end user. This takes alot of work generally, some of it, manual hand-tuning.

All nAo et al are saying is that artists must operate under the same restrictions sometimes, "smarter art pipeline". That is, rather than going hog wild with near infinite resource budgets, they have to tweak and tweak, producing artwork where every texel and every vertex is doing important duty and contributing to a visible different in the end result.

Otherwise, what you're got is psycho-perceptual redundancy, extra information in your artwork that is of no benefit to the end user, like inaudible or psycho-acoustic-blocked frequencies in music, or extra Chroma information in a video signal, that human ABX trials have shown most people can't detect.

What were talking about is the world's most sophisticated compression algorithm: the artists brain, and how they can reduce asset size/complexity with little loss of fidelity.

In the case of textures, we've got good software tools now that can automate a lot of that. In the case of geometry design, not quite there yet.
 
:?: I can see normal maps everywhere in KZ2 (and I can see dead ppl too..;) )

Really? I've only seen them on the characters."Deferred rendering is completely orthogonal to normal mapping, so I don't see why it should limit the amount of normal maps used.Deferred renderers tend to use more memory, so they might end up having less memory available to textures. " That's exactly what i mean. If you have less memory avalible for norms, let alone reg texture, then they are related in a sense. The only way i see this being worth-while is to have a Pseudo-deferred renderer. I don't see the reasoning in this design choice. In the trailer textures are very low-res. And sure, it earily, but can they improve with limited memory? Also, with geometry already stored i would assume no physics?
 
Well, it seems to be that the obvious reason to go deferred is if you are pixel shader and geometry bound as opposed to bandwidth bound. After you do your G-fill, all of your shading becomes O(num_visible_pixels) instead of O(num_total_pixels). This scales really well with # lights and other effects, and you can do lots of neat post-process effects too. Of course, there are big downsides, like alpha blending.

For something like a G8x, I think deferred rendering is very tempting. It has a staggeringly high z/stencil rate, ample texel rate.
 
Well, it seems to be that the obvious reason to go deferred is if you are pixel shader and geometry bound as opposed to bandwidth bound. After you do your G-fill, all of your shading becomes O(num_visible_pixels) instead of O(num_total_pixels). This scales really well with # lights and other effects, and you can do lots of neat post-process effects too. Of course, there are big downsides, like alpha blending.

Actually, with a good z-pass, in a forward renderer your shading is still almost strictly O(num_visible_pixels), taking all the problems with heavily alpha-tested geometry aside.
_If_ you choose a deferred renderer for performance reasons, you are basically trading bandwidth for geometry.

For something like a G8x, I think deferred rendering is very tempting. It has a staggeringly high z/stencil rate, ample texel rate.

It is more than tempting! It is almost pure joy :)
 
one guy said that a single frame can have well over a million polygons!

The devs said more than a million polygons but then that is common in other games to. Some even hit 3-4 million polygons ingame (although they have not deferred renderer engine). Only other DR engine that does a lot would be the Stalker engine which has average of ~1 million polys and very often ~2 million per frame.

Still they may use special rendering techniques in KZ2 to utilize those polygons in a special way, who knows except the devs! :smile:
 
Actually, with a good z-pass, in a forward renderer your shading is still almost strictly O(num_visible_pixels), taking all the problems with heavily alpha-tested geometry aside.
_If_ you choose a deferred renderer for performance reasons, you are basically trading bandwidth for geometry.
I'd say that with decent object sorting even the z-only pass is unnecessary. The engine just needs to chop the scene up a little finer, but that pays dividends in frustum culling anyway.

There are other disadvantages with deferred too that rarely get brought up. In addition to the bandwidth load of writing and reading the G-buffers, you also decouple the texture and math operations so that they can't run in parallel. While filling the G-buffers you have almost no math, so you're texture or ROP limited. While doing the lighting, you have only the G-buffers to read, so you're math limited.

On PS3 this disadvantage may not be significant because the G7x/RSX architecture isn't made to do both simultaneously at full speed anyway. Other architectures, though, can get through the same operations quicker on a forward renderer. I'm going to try to quantify this with some variables:

k - net overdraw (number of pixels evading early Z culling divided by screen pixels)
A - # cycles per pix to read textures
B - # cycles per pix to perform lighting math
C - # cycles per pix to write G-buffer

For G8x or R5xx/R6xx/Xenos doing forward rendering, total cycles per pixel would be:
k * max(A, B)

For a deferred renderer, cost would be:
k * max(A, C) + B

I'm making a few assumption here:
- Good thread handling to parallelize texture and math when possible
- The DR is not slowed down by reading the G-buffer due to sufficient math
- The DR is not slowed down writing the G-buffer due to texture BW when C > A
- The DR is not doing forward shadow mapping, though basically you just need to include this cost in B to keep the formulas the same.

Overall, it's not a clear win unless k is pretty big and B is a lot bigger than A and C. Take into account the work needed to get MSAA going, and I'm not a big fan of DR for ordinary workloads. Of course, these values can vary a lot from pixel to pixel, but it sort of shows why DR appears to have trouble getting the same framerate for similar FR scenes when comparing PC games.

There is one situation where I do see the advantage of DR, but I never see it mentioned. If you have lots of local lights, then you can use the stencil buffer to mark pixels in the light's volume of influence (just like Doom3 does for a shadow volume) and only light those pixels. Good dynamic branching (again, a feature of R5xx/R6xx/G8x but not G7x/RSX) can largely negate this advantage, though, so even here I'm not convinced that DR is a big advantage.
 
There is one situation where I do see the advantage of DR, but I never see it mentioned. If you have lots of local lights, then you can use the stencil buffer to mark pixels in the light's volume of influence
When I speak about DR I take this for given..and I'm sure KZ2 does that..
Anyway you can use the stencil buffer for much more than that, think about speeding up msaa (the lighting and resolve phase) or shading different parts of the screen at different rates..

Marco
 
I'd say that with decent object sorting even the z-only pass is unnecessary. The engine just needs to chop the scene up a little finer, but that pays dividends in frustum culling anyway.

I totally agree with you on this. I usually add, on top of good scene sorting, some kind of heuristic to choose few very large occluders to draw in the automatic zpass on a frame by frame basis: the cost of this is negligeable and you can easily get pretty close to one fragment shaded per pixel.

I'd say this is rarely a problem in a decent forward renderer.
 
More features ahead?

Nice!

I lilked the last part of the presentation.

Still a lot of features planned
‣ Ambient occlusion / contact shadows
‣ Shadows on transparent geometry
‣ More efficient anti-aliasing
‣ Dynamic radiosity


So is there a video stream or any audio recorded from any of the presentations? Maybe its a seminarie that you have to pay for?
 
- The DR is not slowed down by reading the G-buffer due to sufficient math
That is actually pretty hard to achieve considering you need to read g-buffer (or at least most of it) first and then be able to do anything in lighting. KZ guys seem to spend quite some time in reconstruction of normal and position and this + good cache coherency when reading from g-buffer should help to hide most of the latency.

Also I don't fully understand the whole "bandwidth tradeoff" argument in Forward Vs. Deferred - in deferred you have to read g-buffer for lit pixels. In forward you have to process scene geometry + read textures (normals, diffuse, specular, all you have) and do all the filtering for each pass so when I count it I see more bandwidth work for each lit pixel. The rest is the same - your ROPs are utilized the same way, shadow mapping is the same (and even there you have problems with shadows on transparent geometry unless you keep all your shadow maps in memory - want to see that on the console). I see the bandwidth going lower only when you go with single pass forward rendering (again, problems with keeping shadow maps for all lights in memory).
 
Another thing that DR gets right is pixel shaders utilization, as working on (decently size) rectangular areas + stencil masking let your work mostly on fully covered 2x2 tiles.
 
Really? I've only seen them on the characters."Deferred rendering is completely orthogonal to normal mapping, so I don't see why it should limit the amount of normal maps used.Deferred renderers tend to use more memory, so they might end up having less memory available to textures. " That's exactly what i mean. If you have less memory avalible for norms, let alone reg texture, then they are related in a sense. The only way i see this being worth-while is to have a Pseudo-deferred renderer. I don't see the reasoning in this design choice. In the trailer textures are very low-res. And sure, it earily, but can they improve with limited memory? Also, with geometry already stored i would assume no physics?

Bingo!

I belive that there are physics, given that things fall, move, interact, etc.

Also, I fail to see why textures matter so much, given the animation, lighting, phsycis, audio, and post processing effects all present an experience much more impressive than just a high res game.
 
Also I don't fully understand the whole "bandwidth tradeoff" argument in Forward Vs. Deferred

I've thought that KZ's deferred renderer moves too much data around to run on 60 fps on a PS3.

In forward you have to process scene geometry + read textures (normals, diffuse, specular, all you have) and do all the filtering for each pass so when I count it I see more bandwidth work for each lit pixel.

Texture data can be cached very efficiently, so the actual traffic for that is a lot less.
This is of course true for deferred rendering as well, in the lighting stage where they read the G-buffer as textures.

Based on the presentation they have 5 buffers, all with full 720p and 2x multisampling. That seems to be a lot of data, more than I've thought, so I still think that doing all this at 60fps is not possible on the PS3 (and obviously the same on the 360 too).
 
Texture data can be cached very efficiently, so the actual traffic for that is a lot less.
This is of course true for deferred rendering as well, in the lighting stage where they read the G-buffer as textures.

This is what I meant - good cache hit + good pixel pipe load as you are likely to hit full 2x2quads (as nAo pointed out).

Based on the presentation they have 5 buffers, all with full 720p and 2x multisampling. That seems to be a lot of data, more than I've thought, so I still think that doing all this at 60fps is not possible on the PS3 (and obviously the same on the 360 too).

Hm, you are right - 2xMSAA makes indeed a difference here. But still I wonder how it compares (bandwidth-wise) to bilinear or anisotropic filtering on all texture reads for each pixel & pass in forward render. I have to calculate that.

As far as I remember from the talk, one of those is write only (light accumulation) so it's 4 buffers read (I wonder if z-buffer value would be cached between this and depth test later in ROP and if it saves bandwidth) and 1 buffer write.

I wonder if the bandwidth is why it's not 60fps. Although - the only advantage of 60fps I see is that the lag between input & frame render is almost unnoticeable if your renderer is one frame behind your game logic (as a lot of renderers do). This is makes you feel it like the game is slow.
 
Status
Not open for further replies.
Back
Top