NVIDIA's UltraShadow in Doom3

digitalwanderer · Aug 12, 2004

Chalnoth said:
But where in the .pak files is it? At least, that's how I read digi's post. I'm going to start looking myself...

Thanks Steve and Ostsol, that does help me understand it a bit better...but Chalnoth nailed me question, I'm really curious just to see this table.

Ostsol · Aug 12, 2004

Chalnoth said:
But where in the .pak files is it? At least, that's how I read digi's post. I'm going to start looking myself...

It might be generated by the engine. Sometimes the same is done for normalization cubemaps. . .

Thowllly · Aug 12, 2004

Scali said:
Quote:
The specular function in Doom isn't a power function, it is a series of clamped biases and squares that looks something like a power function, which is all that could be done on earlier hardware without fragment programs.

...

The lookup table was faster than doing the exact sequence of math ops that the table encodes, but I can certainly believe that a single power function is faster than the table lookup.

Click to expand...

As you notice, the mad-instruction I've posted IS a series of clamped biases and squares, namely saturate(r0^2 - 0.75) * 4.
And as I said, perhaps it requires 2 instructions in the ARB2 shading language, like it would in ps2.0. Which is perhaps the reason why he chose to use a texture instead.

I also didn't say it was the exact formula, I just gave an example of how he could have approximated the pow() with a series of squares, clamps and biases, in just a single instruction on low-end hardware. I don't know that exact formula. If anyone does, let me know. This is just a very common formula for approximating specular^16, and I wouldn't be surprised if Doom3 used it.

But your code squares first, then scales and biases (opposite of what the quote said). The result is very linear. My code scales and biases first, and then squares, this gives a much nicer curve, and I think somebody (Demirug?) said that's what the lookup does. But then you need two instructions.

Scali · Aug 12, 2004

Why not just say "unzip it" then? Some people might go to the trouble of downloading WinRAR when they already have XP/WinZip/pkunzip/infoZip etc .

Perhaps because I use WinRAR, and so did Humus, so I just didn't think of it, and mentioned it out of habit? Don't try to find hidden agenda's behind everything.

The nv20 path uses a couple of register combiners (well, more like 1.5), but then it's not doing exactly your approximation (which may need more than one... I'm no RC expert, but it's not a 1:1 mapping to ps1.1).

As I say, I don't know the exact formula(s) that Doom3 uses, but this one should be implementable in RC in 1 instruction... It does have the scale-by-4 operation, iirc. But I suppose the exact NV20 code is compiled into the binary, so I don't know how to easily find out what the NV20 code is doing exactly.

Haven't we been discussing replacing the texture with instructions? I think it's entirely relevant. And no, you can't _X4, to my knowledge, with ARB_fragment_program. The current arb2 math version of the LUT also weighs in at 2 instructions -- a simple POW isn't a good enough match.

That's a bit backwards actually, since the LUT encodes a function that is supposed to approximate the pow() in the first place (Carmack's own words). So I would say that the texture/arithmetic hack on low-end hardware isn't a good enough match for pow().

How bad is "bad", relative to forced AF? A couple of FPS difference when replacing the LUT with math? I'd be surprised if fragment processing is at all a bottleneck without AF, but I guess it could be if you regard 1600x1200 with max. AA a necessity.

We already knew that fragment processing generally isn't the bottleneck. The point is that the shader in itself is not optimal for R3x0, while that should have been its main target.

perhaps they're not replacing shaders with the NV40 either

They aren't. We were talking about NV3x, on which Carmack himself said that performance dropped considerably when making even the slightest changes to the shaders. So although he didn't literally say they were doing shader replacements, he did make it very clear that this is what they must be doing.

It's at least an order of magnitude less than a "2000 shaders" figure suggests, that's all I was saying.

Anyone who knows a bit about shader programming would have figured that out, I suppose. They should also have figured out that you still need considerably more light and material types than Doom3's single types to ever reach 2000 shaders. Which is all that I was saying.

Scali · Aug 12, 2004

But your code squares first, then scales and biases (opposite of what the quote said). The result is very linear. My code scales and biases first, and then squares, this gives a much nicer curve, and I think somebody (Demirug?) said that's what the lookup does. But then you need two instructions.

Well, firstly, we cannot assume that Carmack's quote was in order of processing. Secondly I never said that it was the exact Doom3 function. As of yet, I do not know the exact Doom3 function.
Thirdly, don't you need to have at least ps1.4 when scaling first? iirc NV1x/2x register combiners are limited to a -1..1 range, so scaling first would be a problem.

Thowllly · Aug 12, 2004

Scali said:
But your code squares first, then scales and biases (opposite of what the quote said). The result is very linear. My code scales and biases first, and then squares, this gives a much nicer curve, and I think somebody (Demirug?) said that's what the lookup does. But then you need two instructions.

Click to expand...

Well, firstly, we cannot assume that Carmack's quote was in order of processing. Secondly I never said that it was the exact Doom3 function. As of yet, I do not know the exact Doom3 function.
Thirdly, don't you need to have at least ps1.4 when scaling first? iirc NV1x/2x register combiners are limited to a -1..1 range, so scaling first would be a problem.

True, scaling would have to come last. So I guess it would have to be bias and saturate first, then square and scale (if you want the nice curved version)... I guess I should try to find that looup texture myself to verify what's it's really doing... anybody know where it is?

Randell · Aug 12, 2004

Scali said:
Why not just say "unzip it" then? Some people might go to the trouble of downloading WinRAR when they already have XP/WinZip/pkunzip/infoZip etc .

Click to expand...

Perhaps because I use WinRAR, and so did Humus, so I just didn't think of it, and mentioned it out of habit? Don't try to find hidden agenda's behind everything.

also most things on teh net are .rar'd now and Winrar is compatible with winzip but not vice-versa.

SteveHill · Aug 12, 2004

Scali said:
Perhaps because I use WinRAR, and so did Humus, so I just didn't think of it, and mentioned it out of habit? Don't try to find hidden agenda's behind everything.

Trying to find hidden agendas you say? I guess that makes two of us then!

I just thought it was funny that in most D3 threads I've seen, it's WinRAR that's mentioned (seemingly perpetuated), which, you should agree, certainly implies RAR compression.

Scali said:
But I suppose the exact NV20 code is compiled into the binary, so I don't know how to easily find out what the NV20 code is doing exactly.

Dump the register combiner calls using your favourite GL logger.

Scali said:
That's a bit backwards actually, since the LUT encodes a function that is supposed to approximate the pow() in the first place (Carmack's own words). So I would say that the texture/arithmetic hack on low-end hardware isn't a good enough match for pow().

No, it has similarities to pow(). Did you not take on board the point that (N.H)^P is an approximation in itself?

Why then attempt to emulate it perfectly when you can tune the look (and possibly simplify it too... eg no constants)? That I believe was the point John was making.

They felt it was important for artistic reasons to retain the properties of their specular function in the arb2 path, which is perfectly sensible.

Scali said:
The point is that the shader in itself is not optimal for R3x0, while that should have been its main target.

But the arb2 path targets R3xx, R4xx, NV3x and NV4x. I don't think there's a single (non-trivial) shader that is going to be completely 'optimal' for all of these chips. It's a matter of compromising: for the nv30 path to been ditched, any nv30 accommodations to the arb2 path shouldn't hurt performance all that much for other chips, and they don't seem to in typical situations. There isn't a complete bias towards textures anyway as I've shown with the half-angle.

Scali said:
They aren't. We were talking about NV3x, on which Carmack himself said that performance dropped considerably when making even the slightest changes to the shaders. So although he didn't literally say they were doing shader replacements, he did make it very clear that this is what they must be doing.

Thanks for clearing that up. My 5800 is in a box somewhere as it happens.

Scali said:
Anyone who knows a bit about shader programming would have figured that out, I suppose.

I think that's a select group here.

Edit: Fruity Bits.

Scali · Aug 12, 2004

Trying to find hidden agendas you say? I guess that makes two of us then!

I am not trying to find hidden agenda's. I simply assume that if you work on a game for 5 years, and come up with ONE surface shader, you have a DAMN GOOD reason for writing it the way you did.
Now when people replace a single instruction, and get noticable performance increases on the hardware that this shader was supposed to be targeting, I want to know WHY?

I just thought it was funny that in most D3 threads I've seen, it's WinRAR that's mentioned (seemingly perpetuated), which, you should agree, certainly implies RAR compression.

No? RAR reads a variety of formats, I believe it can even open ISOs and .CABs for example. Surely everyone knows those aren't compressed with RAR?
The only thing it implies is that it is in a format that RAR can understand.

Dump the register combiner calls using your favourite GL logger.

I'm not into OpenGL, I don't know how to do that. Besides, I don't have any NV-hardware at the moment. If anyone else does, feel free and post the results here.

No, it has similarities to pow(). They felt it was necessary for artistic reasons to retain the properties of their hacked specular function in the arb2 path.

It's still backwards, no matter how you put it. Why do you think it has similarities to pow()? Perhaps because that is what it approximates?!
And I think the artistic reasons are a bit of a silly excuse. They could also have tried to make the artwork look as good as possible on the high-end hardware.
Besides, when I run the R200 path on my R3x0 card, it doesn't look the same at all, the specular highlights look very different. If they were going for the same appearance, that should not have happened. The difference might actually be larger than the difference that pow() would give vs the LUT. In which case the pow() would be perfectly acceptable.

Did you not take on board the point that (N.H)^P is an approximation in itself?

So the excuse to an approximation of an approximation is the fact that you are approximating an approximation?
If you go down that road, you might aswell state that 3d graphics are only trying to approximate reality anyway, so it doesn't matter how 3d graphics look.
Bottom line is still that pow() is a decent approximation, giving nicely defined highlights. The highlights in Doom3 aren't that well-defined, and I can't imagine an artist actually preferring such washed-out undetailed highlights over the pow()-ones. I wonder how many Hollywood movies use the same function as Doom3 does, rather than a pow() or a falloff-function that is based on artistic demands, rather than demands of ancient hardware (note that the texture is a precalced version of the shading capable by NV1x/2x, the texture cannot contain an arbitrary function, since NV1x/2x cannot do the dependent reads required to use a texture as a falloff function for specular).

But the arb2 path targets R3xx, R4xx, NV3x and NV4x. I don't think there's single (non-trivial) shader that is going to be completely 'optimal' for all of these chips. It's a matter of compromising: for the nv30 path to been ditched, any nv30 accommodations to the arb2 path shouldn't hurt performance all that much for other chips, and they don't seem to in typical situations.

The silly thing is that he didn't ditch the NV3x-path until after he found out that NV3x could render the ARB2-path at the same speed with the new 'magic' drivers (as we know from his .plan at the time, the NV3x series was about twice as slow on the ARB2 path as the R3x0, so that would be plenty suspicious), so he knew about the driver replacements before he made the decision to abandon the NV3x path.

SteveHill · Aug 12, 2004

Scali said:
Now when people replace a single instruction, and get noticable performance increases on the hardware that this shader was supposed to be targeting, I want to know WHY?

The shader targets a number of chips, it's a compromise which I trust balances out performance across all of them in typical situations (not the a-typical forced AF case for instance, which clearly breaks the API).

Scali said:
The only thing it implies is that it is in a format that RAR can understand.

There are a few of these multi-format archive utilities, so what makes WinRAR unique? To be more precise, it suggests that you need WinRAR, whereas the reader may already have an unzipper. The zip format is so ubiquitous that there's even OS support in XP!

Scali said:
It's still backwards, no matter how you put it. Why do you think it has similarities to pow()? Perhaps because that is what it approximates?!

The ultimate goal wasn't to approximate pow as closely as possible, as is clear in the interview.

Scali said:
And I think the artistic reasons are a bit of a silly excuse. They could also have tried to make the artwork look as good as possible on the high-end hardware.

That's tricky when the high-end moves due to release slippage. Besides, the majority don't have said hardware.

Scali said:
Besides, when I run the R200 path on my R3x0 card, it doesn't look the same at all, the specular highlights look very different.

It's quite possible that ATI's R200 GL emulation has a bug or two -- this is apparent with the NV40 drivers and the NV10 & 20 paths.

Edit: Scali, sorry for earlier -- it seems I was editing my post as you were responding. You'll see that I've since added to the power function discussion.

jimmyjames123 · Aug 12, 2004

Scali said:
Now when people replace a single instruction, and get noticable performance increases on the hardware that this shader was supposed to be targeting, I want to know WHY?

On the X800 cards, apparently application controlled AF results in similar speed boosts as what was seen from the Humus tweak. This is the way that [H] tested in their preview.

SteveHill · Aug 12, 2004

Randell said:
also most things on teh net are .rar'd now

Most 'warez' you mean?

Scali · Aug 12, 2004

The shader targets a number of chips, it's a compromise which I trust balances out performance across all of them in typical situations (not the a-typical forced AF case for instance, which clearly breaks the API).

The only one who ever mentioned forced AF is you.
As I said before, there are also noticable gains without AF.

There are a few of these multi-format archive utilities, so what makes WinRAR unique? To be more precise, it suggests that you need WinRAR, whereas the reader may already have an unzipper. The zip format is so ubiquitous that there's even OS support in XP!

Whatever, I don't care, I use WinRAR, and I would suggest it to anyone, since it is very convenient, unlike the OS support in XP.
But I never meant to say that you cannot use any other archiver. I don't care. I don't know why you care.
Also, I am not responsible for what everyone else suggests for opening .pak files.

The ultimate goal wasn't to approximate pow as closely as possible, as is clear in the interview.

What exactly do you think the ultimate goal was then?
The way I read the interview, he said he implemented a pow-like function (== approximation) using scales, biases, clamps, which can be implemented on non-shader hardware.

That's tricky when the high-end moves due to release slippage. Besides, the majority don't have said hardware.

R200 already allows arbitrary functions through textures, that card has been available for ages.

Besides, the majority don't have said hardware.

That doesn't stop most games from putting in high-end paths with extra visual quality (Halo, TRAOD, FarCry, HalfLife2, to name but a few).

It's quite possible that ATI's R200 GL emulation has a bug or two -- this is apparent with the NV40 drivers and the NV10 & 20 paths.

I don't think this is the case. My brother has an actual 8500 card, and at any rate, my card rendering the R200 path looked closer to his 8500 than to my ARB2 path.

digitalwanderer · Aug 12, 2004

SteveHill said:
Randell said:

also most things on teh net are .rar'd now

Click to expand...

Most 'warez' you mean?

No, not just warez.

The real reason everyone just mentions winace to unrar things is because it has a chick on their splash screen, no big conspiracy.

Moloch · Aug 12, 2004

Winrar should be used by most computer enthus in the know, as has much better compression than winzip.
I think it's safe to assume that most people here should atleast know what winrar is, if they don't have it installed.
Anyway...

SteveHill · Aug 12, 2004

Scali said:
The only one who ever mentioned forced AF is you.
As I said before, there are also noticable gains without AF.

I brought forced AF up as that's where the gains seem to be: http://www.beyond3d.com/forum/viewtopic.php?t=15027

Scali said:
I don't care. I don't know why you care.

It just amused me, sorry if it seemed like a personal attack. One does have to be careful with framing advice though; some people have been editing their pk4 shaders directly! Ouch.

Scali said:
What exactly do you think the ultimate goal was then?

Simply put: a nice looking specular response.

"For instance, especially for broad highlights, it is nice to have a finite cutoff angle, rather than the power limit approach."

Scali said:
R200 already allows arbitrary functions through textures, that card has been available for ages.

Well, one can only speculate about the performance difference there between math and lookup. As you said, it could be one math instruction here.

Scali said:
That doesn't stop most games from putting in high-end paths with extra visual quality (Halo, TRAOD, FarCry, HalfLife2, to name but a few).

The arb2 path has nifty distortion effects

, but the general game look is preserved.

Scali said:
I don't think this is the case. My brother has an actual 8500 card, and at any rate, my card rendering the R200 path looked closer to his 8500 than to my ARB2 path.

Fair enough.

Scali · Aug 12, 2004

I brought forced AF up as that's where the gains seem to be: http://www.beyond3d.com/forum/viewtopic.php?t=15027

Correction: that is where the MOST gains seems to be. Performance isn't exactly untouched without AF either. I find it only logical that forcing AF on will slow down any kind of texture lookup, on any GPU. That was not the point I was trying to make.

Simply put: a nice looking specular response.

"For instance, especially for broad highlights, it is nice to have a finite cutoff angle, rather than the power limit approach."

Not at all. The ultimate goal was to have a function that would map to NV1x/2x hardware nicely. The artists didn't exactly have a lot of freedom there, so no doubt if the artists were given a free hand in the function, it would not look like the one currently used in Doom3. It may not look like pow() either, but that's another discussion. This function was clearly limited by NV1x/2x and is therefore not purely for artistic reasons. Carmack doesn't claim that either, he clearly mentions that it has to map onto low-end hardware, only you seem to claim that it is used because it looks better.

Well, one can only speculate about the performance difference there between math and lookup. As you said, it could be one math instruction here.

Personally I don't care if I lose a bit of performance, but gain quality, so if the texture was slower than maths, but looked better, I would use it anyway... however in this case the texture merely replaces maths based on low-end hardware, and therefore clearly doesn't look better.

The arb2 path has nifty distortion effects

Yes, for some reason it is not in the R200 path... No idea why, the hardware should be capable of it. I suppose it is not in the NV2x path either? Probably left it out for performance-reasons.
But those are special effects, not exactly improved visual quality.
With improved quality I mean higher precision rendering, giving a smoother, less aliased end result... And perhaps also effects that improve the appearance of the game... For example HalfLife 2 which enables HDR on ps2.0 hardware with floating point texture support.

Cryect · Aug 12, 2004

radeonic2 said:
Winrar should be used by most computer enthus in the know, as has much better compression than winzip.

Then why aren't they using WinACE? And really besides 'warez' I rarely see items rarred. I would say roughly less than 10% of things I download are not zipped.

More things I download are tgz compressed than rarred...

SteveHill · Aug 12, 2004

Scali said:
Correction: that is where the MOST gains seems to be. Performance isn't exactly untouched without AF either.

It's nothing to make a big deal about in my opinion, particularly given other potential bottlenecks. Once again the arb2 interaction.vfp shader makes a number of compromises, based on weighing up the performances of several chips and I'm sure if the LUT in question had caused a big dip with ATI hardware in general, then it wouldn't be there (or less likely, we'd see more paths/shaders).

Scali said:
Not at all. The ultimate goal was to have a function that would map to NV1x/2x hardware nicely.

Platform constraints are a given, but perhaps "ultimate" was a poor choice of word on my part. As we have shown, there is clear wiggle room even within the tight constraints of nv10/nv20/r200 though -- a closer pow approximation could probably have been used, but it wasn't and I'm sure there was experimentation in this area.

Scali said:
The artists didn't exactly have a lot of freedom there, so no doubt if the artists were given a free hand in the function, it would not look like the one currently used in Doom3

Can't a 3D Programmer (or Technical Director) be allowed to make some artistic judgement calls once in a while?

Scali said:
Yes, for some reason it is not in the R200 path... No idea why, the hardware should be capable of it. I suppose it is not in the NV2x path either? Probably left it out for performance-reasons.

Most likely.

Scali said:
With improved quality I mean higher precision rendering, giving a smoother, less aliased end result... And perhaps also effects that improve the appearance of the game....

Well it's clear that JC has been experimenting with HDR (r_hdr*) and anti-aliased normal maps / highlights (test.vfp), so maybe we'll see at least some of that in a point release.

Scali said:
For example HalfLife 2 which enables HDR on ps2.0 hardware with floating point texture support.

Or fake HDR (in the sense of no tone mapping) with 16bit integer support

-- but let's save that for another discussion.

Scali · Aug 12, 2004

It's nothing to make a big deal about in my opinion, particularly given other potential bottlenecks. Once again the arb2 interaction.vfp shader makes a number of compromises, based on weighing up the performances of several chips and I'm sure if the LUT in question had caused a big dip with ATI hardware in general, then it wouldn't be there (or less likely, we'd see more paths/shaders).

True, skinning and shadowing entirely on the CPU, even with R3x0 or NV3x+ vertexshader power at your disposal is a greater vice.

Platform constraints are a given, but perhaps "ultimate" was a poor choice of word on my part. As we have shown, there is clear wiggle room even within the tight constraints of nv10/nv20/r200 though -- a closer pow approximation could probably have been used, but it wasn't and I'm sure there was experimentation in this area.

As I said before, R200 can implement the falloff-function through a texture (that is what I use anyway), so it doesn't have any constraints, really, it's purely NV1x/2x that has the tight constraints. I find it a rather strange choice to base everything on the lowest common denominator.

Can't a 3D Programmer (or Technical Director) be allowed to make some artistic judgement calls once in a while?

Certainly not!

Well it's clear that JC has been experimenting with HDR (r_hdr*) and anti-aliased normal maps / highlights (test.vfp), so maybe we'll see at least some of that in a point release.

He's had 5 years to experiment, I find the things he comes up with at release time rather poor to be honest.

NVIDIA's UltraShadow in Doom3

digitalwanderer

wandering

Ostsol

Thowllly

Scali

Scali

Thowllly

Randell

Senior Daddy

SteveHill

Scali

SteveHill

jimmyjames123

SteveHill

Scali

digitalwanderer

wandering

Moloch

God of Wicked Games

SteveHill

Scali

Cryect

SteveHill

Scali

Similar threads