New dynamic branching demo

radar1200gs said:
IMO this only demonstates how poor ATi drivers still are.
The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.

LOL, I guess that was an alternative explanation ... but it's waaaay off. No, all it demonstrates is how poor nVidia's hardware is at stencil rejection.
 
ChrisRay said:
I am not bashing Humus, I am mostly curious as to his comments on whether or not the demo is bugged on NV40 hardware, and why NV40 users are experience a performance deficit from this method. :)
The performance drop in NV40/NV3x is most likely due to the lack of early stencil kill ability. And I'm not criticising you, Chris, I know you're much better than that. ;)
 
Humus, did ATi give ya a pat-on-the-head or a bonus or anything weird like that for this? (Sorry, the cursiousity bug bit me and I had to ask.)
 
Outside of the fact that we have the usual people hyping the ATI side of things and the usual people brandishing Nvidia's guns, like Chris Ray I am leary of the extreme difference in the impact of the optimizations on different hardware.

I don't believe that Humus intentionally made the optimization ineffective for Nvidia hardware but I thought the optimization was supposed to be generic? Humus, being the coder you are in the best position to at least make an informed guess at a reason for the discrepancy. Any ideas?

Also, Nvidia is a corporation that has an opportunity to take advantage of features that are unique to their product and they are doing so. The argument that it is Nvidia's 'fault' that Crytek didn't implement these new features in SM2.0 is infantile. It is actually ATI's 'fault' for not implementing SM3.0 support in the first place. Any sensible person would acknowledge that.

Yes it is possible and feasible, but what incentive does Crytek really have to forego SM3.0 and implement fall backs for SM2.0? ATI sure as hell doesn't seem to be giving them any. So Crytek's decision is $$$+3.0 or nothing+2.0 assuming equal effort for either. Yeah, tough decision.

Fact is, Far Cry is a completed retail game that runs perfectly on ATI hardware. Crytek would have never even added these new features without pressure/incentive from Nvidia in the first place and nobody would have seen them on either IHV's hardware. Unless some of you conspiracy theorists think that they intentionally dummied down their SM2.0 features in anticipation of the SM3.0 only addons ;)

The same people that were saying SM3.0 is worthless are the same ones now saying that Crytek/Nvidia are evil for using it. Where does it stop? This is getting ridiculous.
 
Eronarn said:
And to the above poster, they're not evil because they're using it. I consider it evil that they're screwing over 99.999% of customers to hype a new product that's actually an old product with a different name.

Please explain who the 99.9% are and exactly how they are being screwed. I'm quite sure Far Cry will continue to run fine on all hardware that it runs on now. What does a non-NV4X/Far Cry owner lose by Crytek's add-ons?

Do you mean that developers have some obligation to support a certain installed base of hardware for any new features they decide to add to an already complete game? By not doing so aren't they merely hurting their own sales?

Quick poll. Will you be happier if

a) the SM3.0 add-ons prove to be worthless and actually slow performance on NV4X.
b) the SM3.0 add-ons are great but someone finds a way to replace the SM3.0 shaders with equivalent SM2.0 shaders and everyone can join in the fun.
 
hm... well most of that 99.9% really consist of the GF4MX people, and I don't think it matters to those people as much either. ;)

It screws the 5xxx or R3xx/4xx people in that they will be forced to buy SM3.0 capable cards just to see shader effects that could have been done using SM2.0 (should they even care, mind you). This is assuming there is no "someone" to automagically convert SM3.0 shaders to SM2.0 shaders.

Option b) is probably what everyone wants, ideally. It's the "replacing with SM2.0 code" that is up in the air right now, unless you're going to do it. :D
 
Humus said:
Wow, quite some response to this thread. :oops:
Yeah, first i thought it was on topic then i went :rolleyes:

I'm interested too. The demo is fairly simple. It should be easy for someone to just comment out the early-out code, and do some changes to the lighting shader to take advantage of ps3.0.
That's the only thing that would be interesting (apart from the pretty good boost you have ;))
 
Well all I've got to say is, way to go Humus. :)

One thing (about the demo now ;) ) is that some of the back walls in the demo look REALLY flat. Otherwise stunning.
 
Now that I've seen the huge gains nV reaped with SM3.0 in FC v1.2 in AT's benches, I'm going to have to reread Humus' original posts on this technique. :)
 
The Baron said:
Far Cry 1.2 contains no PS3.0-specific effects, only performance benefits resulting from the use of PS3.0. And even with that, you need a DX9.0c beta installed.

Are you sure about that? If the Farcry SM3.0 patch increases performance 30% -- how much of that performance increase is from actual SM3.0 code -- and not from just a general rewriting/optimizing of the shaders like NV did in Halo? NV could probably get a sizable performance increase by just rewriting the current 2.0 shaders. Marketing department -- hmmmm, that’s a nice increase -- lets market it all as a SM3.0 performance benefit.
 
trinibwoy said:
Also, Nvidia is a corporation that has an opportunity to take advantage of features that are unique to their product and they are doing so. The argument that it is Nvidia's 'fault' that Crytek didn't implement these new features in SM2.0 is infantile. It is actually ATI's 'fault' for not implementing SM3.0 support in the first place. Any sensible person would acknowledge that.

Yes it is possible and feasible, but what incentive does Crytek really have to forego SM3.0 and implement fall backs for SM2.0? ATI sure as hell doesn't seem to be giving them any. So Crytek's decision is $$$+3.0 or nothing+2.0 assuming equal effort for either. Yeah, tough decision.

Sorry, but I really don't agree with this justification. Say ATI pays valve to make all nvidia cards run in fixed-function mode. I'm sure nVidia fans would be extremelly happy by that.

And didn't nVidia fans complain that 3Dmark03 had ATI optimisations (PS 1.4) so much that futuremark had to patch it to include a PS 1.1 fallback? Same thing here. As long as it's reasonably possible to write fallbacks developers should do it. If they don't I'll vote with my wallet. I'm waiting for DOOM 3 before I upgrade. As it stands right now, whether I end up buying a 6800 or X800 I will not be buying FarCry.

Today might be ATI getting the shaft, tomorrow it might be nVidia. And considering xbox2, I don't want a flood of PC titles that screw nVidia fans from 2005-until xbox3.

Humus: keep up the good work. Oh btw, I think FC uses stencil shadows in indoor parts.
 
Humus said:
That's just one way to show off this technique. It's not limited to this particular application of range-limited lights. It can do any combination of ifs and still do only the same amount of shader work as if ps3.0 dynamic branching would have been used.

You mean the same amount of pixel-shader work, right? Obviously there's more vertex work and additional stencil buffer work.

My biggest question about your technique: Do you need one extra pass (sending of scene to card) per 'if' statement that you wish to emulate, or can you do multiple independent 'ifs' with one pass?

I'm really surprised that there's no performance gain from your idea on NVIDIA cards. It really should be saving some work - a stencil test should come before fragment shading unless (maybe) you modify depth values in your fragment shader? Anyway, I'd like to look into it this weekend if I have time.
 
If i was paranoaic, i would say there's a corelation between the release of the demo and the SM3.0 test @ Anand :LOL:

More seriously, hope you can do something zeno ;). I would really like to see the difference between this tech and PS3.0.
 
991060 said:
The performance drop in NV40/NV3x is most likely due to the lack of early stencil kill ability. And I'm not criticising you, Chris, I know you're much better than that. ;)
Weird..I know for sure NV2A performs an early stencil test..so I would expect NV30/40 to have it.
Too bad I own a 9800pro..so I can't test it ;)

ciao,
Marco
 
nAo said:
991060 said:
The performance drop in NV40/NV3x is most likely due to the lack of early stencil kill ability. And I'm not criticising you, Chris, I know you're much better than that. ;)
Weird..I know for sure NV2A performs an early stencil test..so I would expect NV30/40 to have it.
Too bad I own a 9800pro..so I can't test it ;)

ciao,
Marco

No, It dosent.
http://www.3dcenter.de/artikel/2003/03-05_english.php

8) In the near future, we will see Doom III and a number of other games based on the new id engine. We already know that that Doom III will highly stress stencil buffers. Does the GeForceFX offer any features increasing performance such as "Fast Stencil Clear", "Early Stencil Test" or a "Hierarchical Stencil Buffer"?

NVIDIA: GeForceFX is very fast at stencil operations, but we are not marketing any stencil-specific implementations.
 
This is a demo about what? Of anything is about a stencil hack of a multipass technique to emulate branching of any shader model available on earth (not just PS3.0).

The thing is: it is a stencil hack -> useless. Everyone knows about this, just like eveyone knows for example that a RenderMan shader can be broken down on every hardware with PS1.1 capabilities (not regarding precision of course).

Since the stencil gets used, it is impossible to have stencil shadows on the scene... And what developer would have the trouble of writting a multipass tecnhique using stencil when he could just write an "if". Is like saying that a scene with a single light with PS1.0 supports HDRI: of course it does, it uses one light, so it doesn't overbright...

And haven't you heard? NVIDIA know supports 3Dc. Really, really. It simply doesn't compress the texture. And that means even better quality than ATI, althought I think NVIDIA implementation maybe by slower... :rolleyes:

The demo is great, aka Humus style. What I enjoyed is the change in speach, like "Nvidia can consider themselves pwned." when he knows perfecly well that this isn't a replacement for dynamic branching.
Another thing I would like to know Humus is why did you forget to implement any kind of occlusition to the part where it uses branching. That does shift the results a lot because the stencil is the "occlustion"...

EDIT: yah, that is right: I registered just to post this... 8)
 
snakejoe said:
No, It dosent.
http://www.3dcenter.de/artikel/2003/03-05_english.php

8) In the near future, we will see Doom III and a number of other games based on the new id engine. We already know that that Doom III will highly stress stencil buffers. Does the GeForceFX offer any features increasing performance such as "Fast Stencil Clear", "Early Stencil Test" or a "Hierarchical Stencil Buffer"?

NVIDIA: GeForceFX is very fast at stencil operations, but we are not marketing any stencil-specific implementations.
Where do you read that? He talks about "not marketing" - which btw has changed in the meantime.
 
No, It dosent.
http://www.3dcenter.de/artikel/2003/03-05_english.php

NVIDIA: GeForceFX is very fast at stencil operations, but we are not marketing any stencil-specific implementations.

That only means, that they are not marketing it!

As far as I know, both ATI and Nvidia perform Z and stencil buffer tests before pixel shader. And I observed this behaviour many times writing my own shaders on both cards (NV3x and R3xx). In some situations early tests are impossible, e.g. writing Z value in pixel shaders.


And about technique.. As wrote in far cry ps3 and stuff topic:
- It requiress as many passes as there are paths in the shader
- It doesn't work with transparent triangles and some blending modes
- It wastes cycles in many cases because some computations must be duplicated in every path.
 
Back
Top