NVIDIA's UltraShadow in Doom3

Luminescent · Aug 12, 2004

Haven't read through the pdf, but from screenshots it seems that source rilies more heavily on material shaders and less on procedural dynamic lighting, meaning more static lightmaps. Correct me if I'm wrong, though.

Sxotty · Aug 12, 2004

With people discussing the glowing projectiles that can be turned on would ultra shadow have a greater improvement with 30 plasma balls flying around? I am honestly curious, or does ultra shadow only work with static lights? If that is the case it is rather pointless as moving lights are one of the coolest things in my opinion.

Scali · Aug 12, 2004

Haven't read through the pdf, but from screenshots it seems that source rilies more heavily on material shaders and less on procedural dynamic lighting, meaning more static lightmaps. Correct me if I'm wrong, though.

True, but that is probably also due to the nature of HalfLife 2. It has outdoor scenes which require very different shading to the small, dark rooms of Doom3.
It handles both static and dynamic light and shadows.
For Doom3, basically everything is handled the same.

Scali · Aug 12, 2004

Sxotty said:
With people discussing the glowing projectiles that can be turned on would ultra shadow have a greater improvement with 30 plasma balls flying around? I am honestly curious, or does ultra shadow only work with static lights? If that is the case it is rather pointless as moving lights are one of the coolest things in my opinion.

UltraShadow basically limits the z-range while drawing, much like a scissor limits the x/y range. This allows you to limit the drawing of the shadows to the area where the shadows actually affect the scene, and save fillrate.
It can easily work with dynamic lights, just calc the z-range for each light at each frame.

But Doom3 has 2 problems, as stated before in this thread, which make Doom3 not fillrate-limited, but CPU-limited:

1) Shadowvolumes are generated on the CPU, not on the GPU.
2) There are sometimes too many render-calls, which cause too much driver overhead.

I assume that 30 plasma balls would effectively mean 30 extra lightsources, which would mean 30 extra times of calcing the shadowvolumes for each object in the scene, and 30 times the extra render calls for a lighting pass.
This most probably will make Doom3 even more CPU-limited, and make UltraShadow even less effective than it already is.
So Doom3 just turns out to be a bad case for UltraShadow.

3dcgi · Aug 12, 2004

It's been a while since I read about UltraShadow, but will depth bounds only come into play if the shadows extend far into the distance? Since Doom3 takes place indoors the shadow volumes will be rejected by z test hardware before they hit the depth bounds.

Scali · Aug 12, 2004

It's been a while since I read about UltraShadow, but will depth bounds only come into play if the shadows extend far into the distance? Since Doom3 takes place indoors the shadow volumes will be rejected by z test hardware before they hit the depth bounds.

Doom3 uses "Carmack's Reverse", which means you stencil on zfail, so the part BEHIND the surfaces is relevant. Besides, ztest works on a per-pixel basis, the depth-bounds should just cut off rendering completely, like clipping, if I understood correctly (wouldn't make sense if it still worked per-pixel).

But yes, the idea is to cut down shadowvolumes that stretch far into the distance (2d scissor can already handle stretching to the sides). Since Doom3 stretches the shadow silhouette to infinity (w=0 projection), there could be cases where a lot of fillrate is wasted in the distance.

SteveHill · Aug 12, 2004

Scali said:
Besides, ztest works on a per-pixel basis, the depth-bounds should just cut off rendering completely, like clipping, if I understood correctly (wouldn't make sense if it still worked per-pixel).

This is not what the EXT_depth_bounds_test does and it is per-pixel: fragments are discarded if the current depth in the z-buffer is outside of the specified depth bounds.

Simon F · Aug 12, 2004

3dcgi said:
It's been a while since I read about UltraShadow, but will depth bounds only come into play if the shadows extend far into the distance? Since Doom3 takes place indoors the shadow volumes will be rejected by z test hardware before they hit the depth bounds.

I was thinking the same thing.

Xmas · Aug 12, 2004

SteveHill said:
Scali said:

Besides, ztest works on a per-pixel basis, the depth-bounds should just cut off rendering completely, like clipping, if I understood correctly (wouldn't make sense if it still worked per-pixel).

Click to expand...

This is not what the EXT_depth_bounds_test does and it is per-pixel: fragments are discarded if the current depth in the z-buffer is outside of the specified depth bounds.

Logically, yes, but that's not what happens most of the time. It is only done per pixel as a fallback if the "higher level culling" can't determine whether the area in question is completely inside or outside the depth bounds.
And this per-pixel fallback is only provided for "correctness", the savings of it are negligible.

Depth bounds not only help when shadows extend far into the scene.
2D scissoring can only handle the attenuation bounding box well (if the light source is far away).

You can't use scissoring if the shadow from an occluder on the left side hits a wall on the right side, but depth bounds can save the fillrate for the area in between.

SteveHill · Aug 12, 2004

Xmas said:
Logically, yes, but that's not what happens most of the time. It is only done per pixel as a fallback ...

Right, I was just describing the test itself, nothing more. I totally agree that there are likely limited benefits given a host of potential rejects earlier up the logical 'pipe' (including the CPU) and typical scenes.

SteveHill · Aug 12, 2004

Scali said:
You can find the exact shader code for Doom3 in the 'pak000.pk4' file (open with WinRar) in the glprogs subdirectory.

They're zips

-- you're helping to perpetuate a .rar myth.

Scali said:
It can be implemented by a single mad-instruction.

I think you'll find it's a couple of instructions, not that it really matters.

Scali said:
In some codepaths the function is evaluated via a texture instead (NV3x-specific optimization, as Humus demonstrated, bad bad bad on ATi).

The general view now seems to be that the LUT is bad for both NV and ATI hardware (perhaps to varying degrees and I've not tested it myself), but only when you force AF on (for all textures), thereby wandering outside of the game's testing arena.

Scali said:
Light vectors are normalized per-pixel with a cubemap (another NV3x-specific optimization, arithmetic would have yielded better quality, and balance between texture and arithmetic operations better).

There's a bit of a balance actually since the half-angle is normalised through math.

As you said in a later posting, the game typically isn't fillrate limited anyway with highend cards, so these little shader tradeoffs often don't mean much. Where they may count however is, as you suggest, with the NV3x line -- this seems like a reasonable developer compromise vs maintaining a separate "nv30" path.

Scali said:
I don't know anything about FarCry myself... and all I know about Source is what is in the PDF I linked to before. The PDF mentions that they have close to 2000 shaders though, so it seems like they have a very varied shading system.

A lot of those 2000 shaders I suspect are generated variations (m batched lights x n light types).

Scali · Aug 12, 2004

They're zips -- you're helping to perpetuate a .rar myth.

I didn't say they weren't zips or that they were rars, I just said you can open them with winrar.

I think you'll find it's a couple of instructions, not that it really matters.

It's one instruction with ps1.x or register combiners, something like this:

Code:

def c0, -0.75, -0.75, -0.75, -0.75

mad_x4_sat r0, r0, r0, c0

Not sure what ARB2 does, could be 2, if it doesn't allow the scaling suffix, like ps2.0. But that isn't relevant, since ARB2 uses a texture anyway, and only the other paths use the single instruction.
If you want to comment, please do it in detail.

The general view now seems to be that the LUT is bad for both NV and ATI hardware (perhaps to varying degrees and I've not tested it myself), but only when you force AF on (for all textures), thereby wandering outside of the game's testing arena.

It is also bad for ATi when not using AF at all, as stated before.
Also, it could be bad for NV because NV does shader replacements in the drivers, as Carmack stated before.

A lot of those 2000 shaders I suspect are generated variations (m batched lights x n light types).

I wonder if that is the case, if they are using stencil shadows... Then you can only handle one light at a time, so batching lights will not work. Even with shadowmaps it gets hard, because you quickly run into instruction limits. Perhaps you can batch 2 or 3 lights at most.

They do have a range of light types, unlike the pointlight-only system of Doom3, and they also have various surfaces, and ofcourse multiple shader paths for each (ps1.x and 2.0). And they have HDR support too.

So while not all 2000 shaders are truly unique, there certainly is a much wider variety of shaders than in Doom3.

KimB · Aug 12, 2004

Scali said:
It's one instruction with ps1.x or register combiners, something like this:

Code:

def c0, -0.75, -0.75, -0.75, -0.75 mad_x4_sat r0, r0, r0, c0

Where do you get this from? It flies in the face of what Carmack said about it:

The specular function in Doom isn't a power function, it is a series of clamped biases and squares that looks something like a power function, which is all that could be done on earlier hardware without fragment programs.

...

The lookup table was faster than doing the exact sequence of math ops that the table encodes, but I can certainly believe that a single power function is faster than the table lookup.

SteveHill · Aug 12, 2004

Scali said:
I didn't say they weren't zips or that they were rars, I just said you can open them with winrar.

Why not just say "unzip it" then? Some people might go to the trouble of downloading WinRAR when they already have XP/WinZip/pkunzip/infoZip etc

.

Scali said:
It's one instruction with ps1.x/GF3/4 shader extensions or with register combiners, something like this:

The nv20 path uses a couple of register combiners (well, more like 1.5), but then it's not doing exactly your approximation (which may need more than one... I'm no RC expert, but it's not a 1:1 mapping to ps1.1).

Scali said:
if it doesn't allow the scaling suffix, like ps2.0. But that isn't relevant, since ARB2 uses a texture anyway, and only the other paths use the single instruction.

Haven't we been discussing replacing the texture with instructions? I think it's entirely relevant. And no, you can't _X4, to my knowledge, with ARB_fragment_program. The current arb2 math version of the LUT also weighs in at 2 instructions -- a simple POW isn't a good enough match.

Scali said:
It is also bad for ATi when not using AF at all, as stated before.

How bad is "bad", relative to forced AF? A couple of FPS difference when replacing the LUT with math? I'd be surprised if fragment processing is at all a bottleneck without AF, but I guess it could be if you regard 1600x1200 with max. AA a necessity.

Edit: See Humus's post about Application Pref. AF: http://www.beyond3d.com/forum/viewtopic.php?p=342434#342434 . You really need to be stressing AA/res and AF for instance before it's an issue.

Scali said:
Also, it could be bad for NV because NV does shader replacements in the drivers, as Carmack stated before.

I haven't noticed a drop in framerate when tweaking the interaction shader (which would break replacement). Of course, this was probably in a non-fragment-limited situation, but, gosh, perhaps they're not replacing shaders with the NV40 either (*).

Scali said:
I wonder if that is the case, if they are using stencil shadows... Then you can only handle one light at a time, so batching lights will not work. Even with shadowmaps it gets hard, because you quickly run into instruction limits. Perhaps you can batch 2 or 3 lights at most.

Batching does depend a lot on the light type and also the shader profile (eg one may only be able to do a single projected source at once with ps1.1), but it sounded as if they were being as aggressive as they could be going by their 6800 Leagues presentation.

Scali said:
So while not all 2000 shaders are truly unique, there certainly is a much wider variety of shaders than in Doom3.

It's at least an order of magnitude less than a "2000 shaders" figure suggests, that's all I was saying.

(*) I think I must have missed it if JC said that shader replacement was happening right now with NV3x & NV4x (of course it wouldn't be the first time).

digitalwanderer · Aug 12, 2004

DUMB QUESTION: Where or what is this look-up table that keeps being referred to? :|

Ostsol · Aug 12, 2004

digitalwanderer said:
DUMB QUESTION: Where or what is this look-up table that keeps being referred to? :|

In pixel shaders, it's a texture used to look up the result of an operation in a single instruction, rather than having to execute multiple arithmetic instructions to achieve the same result. It can be less precise, but if the mathematical function is particularily complex, it can also be much cheaper.

Look-up tables are also often used in other areas not related to graphics. Sometimes they are used to resolve trigonometric functions such as sin/cos/tan/etc. . . In these cases, the look-up table is simply an array of floating-point values. Depending on the implementation, this can be even less precise than textures in pixel shaders, which sample interpolated values.

SteveHill · Aug 12, 2004

digitalwanderer said:
DUMB QUESTION: Where or what is this look-up table that keeps being referred to? :|

It's a lookup texture used by interaction.vfp to emulate the quasi specular-power function used by older hardware paths.

KimB · Aug 12, 2004

But where in the .pak files is it? At least, that's how I read digi's post. I'm going to start looking myself...

Scali · Aug 12, 2004

Quote:
The specular function in Doom isn't a power function, it is a series of clamped biases and squares that looks something like a power function, which is all that could be done on earlier hardware without fragment programs.

...

The lookup table was faster than doing the exact sequence of math ops that the table encodes, but I can certainly believe that a single power function is faster than the table lookup.

As you notice, the mad-instruction I've posted IS a series of clamped biases and squares, namely saturate(r0^2 - 0.75) * 4.
And as I said, perhaps it requires 2 instructions in the ARB2 shading language, like it would in ps2.0. Which is perhaps the reason why he chose to use a texture instead.

I also didn't say it was the exact formula, I just gave an example of how he could have approximated the pow() with a series of squares, clamps and biases, in just a single instruction on low-end hardware. I don't know that exact formula. If anyone does, let me know. This is just a very common formula for approximating specular^16, and I wouldn't be surprised if Doom3 used it.

Randell · Aug 12, 2004

Chalnoth said:
But where in the .pak files is it? At least, that's how I read digi's post. I'm going to start looking myself...

juding on the difference in your two programming knowledge, I would hazard a guess that Digi was majoring on what it is rather than where it is..

NVIDIA's UltraShadow in Doom3

Luminescent

Sxotty

Scali

Scali

3dcgi

Scali

SteveHill

Simon F

Tea maker

Xmas

Porous

SteveHill

SteveHill

Scali

KimB

SteveHill

digitalwanderer

Ostsol

SteveHill

KimB

Scali

Randell

Senior Daddy

Similar threads