Carmack's comments on NV30 vs R300, DOOM developments

demalion · Jan 31, 2003

Shouldn't that be "what you need is a 'Zmin' (Obviously,ATI is very aware of this...", Mintmaster?

Maybe that's why Colourless felt he had to make the above statement.

Basic · Jan 31, 2003

Colourless:
Yes, that's right. But ATI probably didn't concider this case important enough to double the HierZ hardware for it.

demalion:
I don't think Colourless was refering to anything special in Mintmasters post. That comment has kind of hung over this discussion all the time.

But you're right, that's a typo and should be "Zmin".

Mintmaster:
Good description but I have one objection.
You don't need to change the D3DRS_ZFUNC (I assume that's equal to glDepthFunc). It's the same ZFUNC as otherwise, it's just that you update the stencil buffer when this same ZFUNC fails.

So it doesn't break

ATI said:
First, and the most important, do not change sense of the depth comparison function in the course of a frame rendering.

Which might but don't have to disable HierZ for the rest of the frame, depending on how clever the drivers are.

But it does break

ATI said:
In addition, few other things interfere with hierarchical culling; these are - outputting depth values from pixel shaders and using stencil fail and stencil depth fail operations.

I have thought the same way as you did there at times. But I hope I haven't put it to print.

Mintmaster · Jan 31, 2003

demalion said:
Shouldn't that be "what you need is a 'Zmin' (Obviously,ATI is very aware of this...", Mintmaster?

Maybe that's why Colourless felt he had to make the above statement.

Yes, you're right. Oops.

KimB · Jan 31, 2003

Crusher said:
No, neither normal volumetric shadowing or Carmack's Reverse method touch hidden pixels. Both methods make comparisons on the Z-Buffer, which contains no hidden pixels at all. The way Carmack's Reverse differs is that he renders the back face of the shadow volume, incrementing the stencil buffer when there's a z-fail, and then render the front side of the volume and decrement the stencil buffer if there's a z-pass.

Don't know if this has been addressed yet, but the shadow mapping algorithm that Carmack uses does nothing on z-pass:

Draw back sides, doing nothing with depth pass and incrementing with depth fail.
Draw front sides, doing nothing with depth pass and decrementing with depth fail.

Which makes sense if you think about it, because if the shadow hull surrounds an object, the back side will fail on a depth check, while the front side will pass the check. With this algorithm, the stencil buffer will still carry a value.

Mintmaster · Jan 31, 2003

Basic,

While I do know how stencil buffers work, I haven't programmed with them yet. I assumed that if the Z-test kills a pixel, then the stencil comparison isn't performed. While I was right, now that I look up the various stencil-related renderstates, I see your point.

Still, doesn't this mean that there are several ways of doing this?
For both methods below,
-Set D3DRS_STENCILFUNC to D3DCMP_ALWAYS
-Disable D3DRS_ZWRITEENABLE and D3DRS_COLORWRITEENABLE

Method 1 (what I was saying)
-Set the D3DRS_ZFUNC to D3DCMP_GREATER
-Set all stencil operation flags to D3DSTENCILOP_KEEP except for D3DRS_STENCILOP_PASS, which to set to D3DSTENCILOP_INCR or D3DSTENCILOP_DECR.

Method 2 (what you were saying)
-Keep the D3DRS_ZFUNC to D3DCMP_LESSEQUAL
-Set all stencil operation flags to D3DSTENCILOP_KEEP except for D3DRS_STENCILOP_ZFAIL, which to set to D3DSTENCILOP_INCR or D3DSTENCILOP_DECR.

Both methods seem equivalent to me. I wonder if this could potentially lead to some ambiguity in DX? I think I'll take a closer look at it.

Sorry, but I don't know OpenGL. Maybe there is only one way to do it there. One problem with describing this clearly is that in DX a passing Z-test means it satisfies the D3DRS_ZFUNC condition, which can be anything. The way JC is talking, passing a Z-test is when the current pixel is in front or equal to what's there, i.e. always the condition D3DCMP_LESSEQUAL.

EDIT: Clarified a few things.

Basic · Jan 31, 2003

OK, maybe I'm being unclear again.
Yes both methods should work, but...
Method 1 breaks my quote 1 above from ATI's optimization docs.
Method 2 breaks my quote 2 above from ATI's optimization docs.

Quote 1 seems to be the worst to break, thus use method 2.

OpenGL should be the same.

One difference in OpenGL though, and it seems strange if it's different in DX. Stencil test is done before Z-test. Are you sure it's the other way in DX? But it shouldn't affect whether the two algorithms work.

It doesn't have to be done that way physically, but the results must be just as if it was. You don't need to care about that in most cases. So you could put HierZ before stencil test for performance. And I guess it's done like that in R300.

But if STENCILFUNC != ALWAYS, and STENCILOP_STENCILFAIL != STENCILOP_ZFAIL, then there's a difference.
If the pixel fails both stencil and z-test, the testing order is important. And OpenGL should do STENCILOP_STENCILFAIL. This means that even if HierZ says that a block of pixels is hidden, the stencil test must still be done for all of them. This is a reason why stencil fail op interfere with HierZ.

I don't know why stencil test comes before Z-test. Have anybody a good explanation?

WaltC · Jan 31, 2003

Carmack's followup .plan indicates DOOMIII is dropping all vendor-specific paths.

Joe DeFuria · Jan 31, 2003

Link?

RussSchultz · Jan 31, 2003

According to shack news it hasn't been updated. (I think you're falling into the same trap I did. Apparently he's dumping all vendor specific VERTEX paths, not all vendor specific paths) It seems the pixel register combiners, etc will stay.

EDIT: clarify.

WaltC · Jan 31, 2003

Joe DeFuria said:
Link?

http://www.bluesnews.com/plans/1/

Yea, I really thought everybody had seen this....Here's some of the text:

"Doom has dropped support for vendor-specific vertex programs
(NV_vertex_program and EXT_vertex_shader), in favor of using
ARB_vertex_program for all rendering paths. This has been a pleasant thing to
do, and both ATI and Nvidia supported the move. The standardization process
for ARB_vertex_program was pretty drawn out and arduous, but in the end, it is
a just-plain-better API than either of the vendor specific ones that it
replaced. I fretted for a while over whether I should leave in support for
the older APIs for broader driver compatibility, but the final decision was
that we are going to require a modern driver for the game to run in the
advanced modes. Older drivers can still fall back to either the ARB or NV10
paths."

Russ, thanks....I thought it might have been a little too narrow, but with C sometimes it's hard to tell...

Joe DeFuria · Jan 31, 2003

Yea, I really thought everybody had seen this....Here's some of the text...

Yeah, we've seen it....we just haven't seen any .plan where Carmack claimed to be dropping all vendor specific paths!

(Which he didn't say.)

When you said "follow-up" .plan, I thought maybe he made an "update" to that .plan where he stated such, and I just hadn't seen that.

EDIT: In a nutshell, Carmack expects that he will be able to drop all vendor specific vertex paths (already dropped vertex programs, and will likely drop vertex arrays). However, there is every indication that vendor specific fragment / pixel paths will be in there. Specifically, for the Nvidia NV1x, NV2x, and NV3x chips, and the ATI Radeon R200 series chips.

Hellbinder · Jan 31, 2003

I am going to go out on a limb here.. But after todays Nvidia interview, i think that it is VERY likely that Nvidia will NEVER be able to run the ARB2 path on the Nv30.

It seems i would say MORE than likely that Nv30 executes FP32 in 2 clock cycles, meaning they will pretty much forever be relegated to a 50% speed hit.

I could have this all wrong of course... any comments?

RussSchultz · Jan 31, 2003

Any comments?

I vote to table the discussion for a month or so.

Joe DeFuria · Jan 31, 2003

What exactly are we "allowed" to discuss concerning the GeForceFX?

WaltC · Jan 31, 2003

Joe DeFuria said:
EDIT: In a nutshell, Carmack expects that he will be able to drop all vendor specific vertex paths (already dropped vertex programs, and will likely drop vertex arrays). However, there is every indication that vendor specific fragment / pixel paths will be in there. Specifically, for the Nvidia NV1x, NV2x, and NV3x chips, and the ATI Radeon R200 series chips.

Right...as Russ said, I interpreted it too broadly, maybe--Carmack still seems awfully broad to me here. But anyway...this was an update to his original .plan in which he discussed the various vendor paths that this thread is concerned with. He added these remarks at a later time.

Actually, I would like to see him drop all vendor specific paths and go with ARB2 exclusively. I can't see how he feels that writing in vendor specific paths in his applications is a way to boost acceptance of OpenGL, since it could well lead to the appearance of fragmentation in the API--at least I think so.

RussSchultz · Jan 31, 2003

What exactly are we "allowed" to discuss concerning the GeForceFX?

Only things that paint it in a glowing light, of course.

The point being, its been relayed by Carmack that they'll fix the ARB2 "problem" shortly, yet we're seeing the dire statements such as those above that are based entirely on current performance that is supposed to be fixed.

Take a deep breath. Relax. See how it looks in a month, then go off on your doom and gloom parade if it hasn't changed.

Joe DeFuria · Jan 31, 2003

Despite Rev's Advice...

The point being, its been relayed by Carmack that they'll fix the ARB2 "problem" shortly...

Nobody, including Carmack, knows if and when it will actually be fixed....and to what degree. That "nVidia told him" is not particularly convincing. I know that I can't personally recall when any driver update over any length of time increaed performance in a game situation (not just a sythetic situation), by 100%.

Take a deep breath. Relax. See how it looks in a month, then go off on your doom and gloom parade if it hasn't changed.

I assume those comments are directed at Hellbinder, unless you want to point me in the direction of my own gloom and doom parade. Though IIRC, certain people were criticized for gloom and doom parades about the "potential noise" of the FX FAN back in November....they were told to wait for actual reviews by websites. And then when they came in....certain people still were bothered by all the negative reaction.

So, maybe you can make a deal with Hellbinder....no more gloom and doom over fp32 performance for "a month or so." But then if it doesn't pan out, he gets to vent to his heart's content (to the limits of the board moderators of course), without hearing another "just wait..." comment from you.

Sound fair?

RussSchultz · Jan 31, 2003

Absolutely. Why would I have any problem with him stating facts? If it does pan out that it executes a FP32 vector operation in 2 clocks instead of one, then that's the facts.

Of course, it might get irritating to see him repeat it over and over. But I won't say "just wait...". I'll just say "ok, we heard you the 10th time, no need to continue".

Grall · Jan 31, 2003

"The point being, its been relayed by Carmack that they'll fix the ARB2 "problem" shortly, yet we're seeing the dire statements such as those above that are based entirely on current performance that is supposed to be fixed."

How do you 'fix' a performance defecit of 50%? It just does not seem fixable at all to me, short of redesigning the hardware. Wringing a little extra performance in the order of 10% or so over the course of a card's lifetime MAYBE, there just doesn't seem to be much you can do about pixel shader execution speed by fiddling with drivers as it is pretty much a hardware-only issue.

By the way - I didn't see your point at all. Would you please point to the relevant quote where Carmack says the ARB "problem" (what function the quotes have I'm unsure of as everybody would pretty much agree that it is a *problem* if your clock speed is 50% HIGHER yet performance is 50% LOWER) will be fixed, and fixed shortly at that.

*G*

RussSchultz · Jan 31, 2003

Nvidia assures me that there is a lot of room for improving the fragment program performance with improved driver compiler technology.

Whether that means it will improve to where FP32 vector ops on the arb2 path get 1 cycle execution or not is unknown. As far as we know, the compiler needs to re-order something to prevent a pipeline stall and that could be enough to double performance. VLIW processors can be tricky beasts and writing compilers for them are not trivial. (DSP C compilers generally suck, for example)

Carmack's comments on NV30 vs R300, DOOM developments

demalion

Basic

Mintmaster

KimB

Mintmaster

Basic

WaltC

Joe DeFuria

RussSchultz

Professional Malcontent

WaltC

Joe DeFuria

Hellbinder

RussSchultz

Professional Malcontent

Joe DeFuria

WaltC

RussSchultz

Professional Malcontent

Joe DeFuria

RussSchultz

Professional Malcontent

Grall

Invisible Member

RussSchultz

Professional Malcontent

Similar threads