New dynamic branching demo

radar1200gs said:
On ATi hardware When you use dynamic branching without the optimisations/bypasses you get a low framerate. When you bypass the branching with what humus wrote it speeds up. Conclusion - Something is wrong with ATi's dynamic branching at the driver or hardware level.

On nVidia when you use dynamic branching without the optimizations it is faster than when the optimization is enabled. In other words the hardware is already efficient at dynamic branching.

Dynamic branching is part of SM3.0 and is therefore not supported by current ATI chips. Therefore, I can't quite see how "Something is wrong with ATi's dynamic branching at the driver or hardware level" :?

The whole point of this thread is to discuss Humus' demo which allows cards that don't support dynamic branching (such as R3XX, R4XX, NV3X etc.) to access some of its benefits.
 
Evildeus said:
If i was paranoaic, i would say there's a corelation between the release of the demo and the SM3.0 test @ Anand :LOL:

More seriously, hope you can do something zeno ;). I would really like to see the difference between this tech and PS3.0.

Well, this "dynamic branching" demo is about nothing else than stealing the show to nvidia like Humus already stated because it isn't dynamic branching at all. It's just a method which does a different job but nevertheless gives the same result as dynamic branching for some rare lighting situations without the flexbility of what PS3.0 has to offer. But contrary to SM3.0, "Humus new technique" will not be used in any upcoming game.

Unfortunately for Humus, nvidia is currently stealing the show to his fake dynamic branching demo with the upcoming FarCry 1.2 patch (previewed at anandtech and techreport) which includes major performance increases with the help of the SM3.0 path in a real-world game and not in a limited tech demo with some random lights swirling around in a small room.

and yes, you've guessed it, I only registered to post this... but you can expect more :)

Humus, I must really say that I am deceived of you and the way you're spreading misinformation around different forums. You really should know better.
 
He is using a programming trick to do in this specific shader something that has similar results like dynamic branching.

But this does not show that dynamic branching "is just a marketing thing".

It is impossible to do effective dynamic branching in most shader programms with SM2.0!
 
samker said:
and yes, you've guessed it, I only registered to post this... but you can expect more :)

Please, feel free to discuss the technical merits / drawbacks of various solutions.
 
"nVidia can consider themselves owned
tongue.gif
"


sounds more like a joke to me, especially with the smilie there.:rolleyes:
 
Mordenkainen said:
Sorry, but I really don't agree with this justification. Say ATI pays valve to make all nvidia cards run in fixed-function mode. I'm sure nVidia fans would be extremelly happy by that.

I'm not really trying to justify anything. And your analogy is poor at best :) Far Cry does support the highest shader model available on ATI cards - 2.0. Running a 2.0 card at 1.1 is in no way analogous to this situation.

And didn't nVidia fans complain that 3Dmark03 had ATI optimisations (PS 1.4) so much that futuremark had to patch it to include a PS 1.1 fallback?

Similar situation, wrong application. I don't think you can equate a benchmark to these extra Far Cry features. Please realize that these features are in a patch and are not what you 'paid for' so to speak. If you though Far Cry was awesome before and willing to buy it, why would you change your mind now? The game hasn't.
 
Alstrong said:
"nVidia can consider themselves owned
tongue.gif
"


sounds more like a joke to me, especially with the smilie there.:rolleyes:
You may be right, but it seems to me that the reason of the is to try to demonstre the non-advantage of PS_3.0 branching (and hence Nv40) and it seems that many people (and even new ones) are displeased with this way and even more disagree with the advantages of this particular technic.

Unfortunately the discussion over this technic is a bit difficult as the times of posting are quite incompatible.
 
Sorry, but I really don't agree with this justification. Say ATI pays valve to make all nvidia cards run in fixed-function mode. I'm sure nVidia fans would be extremelly happy by that.
You could always force the SM 2.0 path if that happens. NV40 can do everything any version of SM 2.0 can do without code change.
 
It's weird, my customer test shows that NV40 does have early stencil rejection, even if alpha test is enabled. :?:
 
991060 said:
It's weird, my customer test shows that NV40 does have early stencil rejection, even if alpha test is enabled. :?:
What is weird about that? Stencil test and alpha test can only kill pixels, and both happen before anything is written to memory, so the order of those tests is irrelevant (but they affect operation of hierZ). Now, if you were talking about stencil operation...
 
Well, i think that what he meant is: It's weird because the Nv40 does have early stencil rejection as R420, but in humus demo the R420 does take advantage of that whereas the NV40 sees its performance decrease.
 
Evildeus said:
It's weird because the Nv40 does have early stencil rejection as R420

They may both have early z rejection, but the question is where in the pipeline they are both rejecting.
 
DaveBaumann said:
Evildeus said:
It's weird because the Nv40 does have early stencil rejection as R420

They may both have early z rejection, but the question is where in the pipeline they are both rejecting.
Well, maybe. So it's not even interesting to compare both, now that we seem to know that Nv and Ati are not using the same way as the benchs tend to show.

Now i'm still interesting to see the PS3.0 doing the same thing, and the answers to the drawbacks pointed out over here.
 
DaveBaumann said:
Evildeus said:
It's weird because the Nv40 does have early stencil rejection as R420

They may both have early z rejection, but the question is where in the pipeline they are both rejecting.
This is how I tested:
First disable stencil then run the fillrate tester using 2 instructions, the result was just as expected, about 3200MP/s.
Then set render state so that all pixels were rejected, the result was much higher, around 22000MP/s.(Z writing was disabled, and there's no color output because of the rejection)
Then I enabled alpha test, the result was about the same as the 2nd situation.

The above data clearly showed stencil rejection happens before the pixel shading unit, otherwise I wouldn't get that high fillrate.

edit: spelling.
 
And when developers talked about the befits of SM3.0 they always mentioned branching and it's main purpose, which always seemed to be helping this problem with per-pixel lights, and now it's possible without SM3.0 albeit using a trick, where is the problem?
 
Evildeus said:
You may be right, but it seems to me that the reason of the is to try to demonstre the non-advantage of PS_3.0 branching (and hence Nv40) and it seems that many people (and even new ones) are displeased with this way and even more disagree with the advantages of this particular technic.

Unfortunately the discussion over this technic is a bit difficult as the times of posting are quite incompatible.


Fair enough :)
 
Back
Top