New dynamic branching demo

JD · Jul 3, 2004

I think Humus should have removed that nv line from his page and present the demo as is, ie. technical demo instead of marketing demo. I think this is what caused the hostility in the first place. Other than that, the demo is cool and gives those who don't have sm3 capable card to duplicate some of sm3 thru different means. I think Ati will have sm3 capable hw in next 4 to 6 months as they stated. By that time there might be more sm3 games avail. than there are now.

Mariner · Jul 3, 2004

Ruined said:
Logic would dictate because in reality, it's either not practical to implement, would have problems on larger scale games, or that Shader Model 3.0 is simply a better answer. If one of these options were not the case, it would have already been used by now, because it is nothing new.

If this technique is a viable option to improve performance/compatibility on cards which don't support SM3.0 (in some cases, at least) then it will be up to ATI developer relations to assist developers in implementing such a technique.

No games featuring SM3.0 are currently available (unless you consider the as yet unavailable FarCry 1.2 patch and beta DirectX 9.0c as a currently available game). Therefore, we simply don't know if any of the SM3.0 games will support the technique as espoused by Humus. Let's wait and see if it is used before trying to claim it is worthless, shall we?

Mr. Travis · Jul 3, 2004

"For a more positive update, after a discussion with CryTek about the new rendering path, we have learned that the lighting model implimented in the SM3.0 Path is exactly the same as was used in the SM2.0 Path. The only exception is that they used the conditional rendering (branching in the pixel shader) to emulate multipass lighting in a single pixel shader. The performance gains we see actually indicate that PS3.0 branching does not have as significant a performance hit as previously thought (and proves to be more efficient than using multiple pixel shaders in a scene)."

anyway it sounds like far cry is using 3.0 for a similar purpose as humus, so I wonder if his technique could be applied? I'll look through the shaders but I'm not much of a coder

KimB · Jul 3, 2004

It sounds to me more like FarCry had multipass lighting (which is what Humus is doing), and moved to a single-pass technique, which resulted in a performance increase.

kihon · Jul 3, 2004

Re: programming

BRiT said:
...

OH NO YOU DONT! Im not falling for that one again

Sorry - couldnt resist.

Humus · Jul 3, 2004

pocketmoon66 said:
Would the stencil trick be faster if you made it depth/stencil only and then added an extra pass for the ambient ?

You mean splitting the first depth+ambient pass into two different passes? Then no, it'll be slightly slower. The ambient shader is pretty cheap, so it'll be a loss doing another pass just for that.

Humus · Jul 3, 2004

nAo said:
It shoud be easy to understand, but what you fail to understand is that Humus has not used any PS2 feature. Obviously this is not a flaw, but unexperienced people have to understand we're not hearing nothing new here. Every medium game developer knows that stuff. Rebranding it that way is what irritates me. Look how that's fooled a lof of people here.
Humilty please..

Fair enough, but let me take this in chronological order.

Once upon a time ...

maybe 2 or 3 weeks ago, I had a discussion with my co-worker Guennadi at lunch. Since nVidia were making a big deal out of dynamic branching he suggested that I should make an SDK sample as a response to that, maybe I could do early-out by using some clever tricks with stencil? Well, I thought it sounded like it might be possible, but didn't expect much of a performance gain, or any at all, but I put that in my TODO list anyway. On Tuesday this week I began working on it, solved the practical problem of how to control stencil from a shader by using alpha test, got it up and running and began to see that there was a huge performance boost potential. So I got pretty excited about all this, especially when I had all the lighting and everything working and saw like 2-3x performance boost, and I still hadn't tweaked the thing.

Then came the Far Cry thread, and the topic of dynamic branching. So of course I brought this technique up. At this point I thought it was new, especially since I came up with it without reading any papers on the topic or anything, just developed it through my own work, so of course I presented it sort of as "my technique". Some people pointed out that it wasn't new. A guy mentioned two papers, which I then googled up. I went like "duh, of course", because I was aware of both these papers, but never read them or thought about how they implemented it. Quick check and indeed, they used the same technique. Now, big deal. So it was old, but the whole point was that it works, does the job and is fast. But some people got stuck on the "my technique" part. From that those papers were posted I changed to refer to is as "this technique" as to not steal any credit for inventing it, but it seems people never noticed and continued to make a big deal about it not being new or that I've been dishonest and what not. I mean, the "this is not new" keeps poping up everywhere, even on other forums and comments made on my site. There was never any intent to rebrand existing techniques. I'm just excited that it works so well and I'm more interested in the technical pros and cons than the who and when it was invented.

pocketmoon66 · Jul 3, 2004

pocketmoon66 said:
Found what was hurting NV cards :

changing
dev->SetRenderState(D3DRS_STENCILPASS, D3DSTENCILOP_ZERO);
to
dev->SetRenderState(D3DRS_STENCILPASS, D3DSTENCILOP_KEEP);

(Early Stencil kill doesn't work if your still writing to stencil??)

Oops! missed a bit out - wasn't clearing the stencil buffer between passes.
Demo now runs :

1280x960
FALSE: 51
TRUE: 185 ish <- better
DB PS2 (cmp) 54
DB PS3 (if then else) 65 ish

I'll work on combining the individual light passes into a single SM2/3 pass and see how that works out...

KimB · Jul 3, 2004

Now you just need to see what the performance characteristics would be like in a real game. PS 3.0 shaders in Far Cry increase performance by reducing the number of passes. Your shader increases the number of passes, but reduces total pixel shader processing (over "normal" PS 2.0).

So, if more geometry was used (a quick hack would be to tesellate the geometry that's there), or longer shaders (more code per branch), the balance might shift back towards PS 3.0.

Humus · Jul 3, 2004

paju said:
What is the minimum level of PS then required for the Dynamic branchin demo? If it's lower than PS2 then it sure should increase the attention of any developer who were not aware of this.

It's possible to do with any shader capable hardware, or even on fixed function hardware. As long as it does top of the pipe stencil rejection. On fixed function hardware you may be limited in what if-expressions you can implement, and of course what lighting or effects you can do, but for something like this demo you could actually render the if-statement with any dot3 bumpmapping capable hardware.
But in general, for it to be practical you'll want to have shaders.

pocketmoon66 · Jul 3, 2004

Humus said:
But some people got stuck on the "my technique" part.

Well I *KNOW* I invented shadow buffers, image imposter, volumetric shadows and surround sound, all in my younger days

It's only when you find out that someone else actually thought about your latest 'idea' it in 1981 that you get depressed

I take it as a sign of intelligence that I reinvented so much in ignorance!

Humus · Jul 3, 2004

nAo said:
Probably there are plenty of games that used , use and will use tricks along the same way, and there are since GPUs with alpha(or destination alpha..) and stencil test appeared on the market, many many years ago. Oh man.. this stuff is so basic that surely somone has already patented it 8)

While it's basic functionality doesn't mean it's obvious usage of it. How long have hardware done early stencil rejection? R100 was first with early Z rejection, but I don't know if it also did early stencil rejection. Using clipping planes or scissor rectangles is basic functionality too, doesn't mean hardware does it fast, or that developers are using it. In fact, clipping planes used to be a pain on GF3/4 since it was implemented with TEXKILL in the fragment shader, and didn't improve things much, and in fact, often forced it into software since each TEXKILL took up one TMU. I don't know if this is still true on GFFX and up or if they are using real geometry clipping planes now.

euan · Jul 3, 2004

pocketmoon66 said:
Well I *KNOW* I invented shadow buffers, image imposter, volumetric shadows and surround sound, all in my younger days

It's only when you find out that someone else actually thought about your latest 'idea' it in 1981 that you get depressed

I take it as a sign of intelligence that I reinvented so much in ignorance!

The thing about being a software developer is that you get a huge buzz when you write something that wasn't immediately apparent, that actually works, let alone works really well. Even if it's as simple as getting a stupid slitter window to work in MFC, it's still the same feeling of acomplishment. That's why people do it, and will continue to do it. If you don't think you've invented something, you'll never get anything out of your job, or hobby.

Humus · Jul 3, 2004

Ruined said:
991060 said:

the optimization can also be applied to NV3x/NV40 now, just change "dev->SetRenderState(D3DRS_STENCILPASS, D3DSTENCILOP_ZERO);" to "dev->SetRenderState(D3DRS_STENCILPASS, D3DSTENCILOP_KEEP);" and clear the stencil buffer after drawLighting call.

got 80FPS with optimization off and 230-250FPS with optimization on

Click to expand...

See the above post, someone fixed your code for you

Humus maybe you can answer one basic question about your demo that kind of contradicts the reasoning behind your statements.

The technique you have used in your demo has been around for a while. People who program shaders for games like Halo, Far Cry, etc, are not morons. In fact, they have made complex, incredible looking games far beyond the scope of a simple graphical demo. That being said, odds are they know of this technique or even if they didn't could have figured it out. Since this would be the case in reality, if the technique has not already been used in games even though it has been around for a while, why is it not being used, and why are developers choosing Shader Model 3.0 instead, with over 10 titles slated to support SM3.0 already?

Logic would dictate because in reality, it's either not practical to implement, would have problems on larger scale games, or that Shader Model 3.0 is simply a better answer. If one of these options were not the case, it would have already been used by now, because it is nothing new.

It's still the card or the drivers. It just doesn't do early-stencil rejection if you write the stencil buffer. Either that's a hardware limitation, or something with the drivers. But you can work around it, and I sure will update the demo to support it, so you can ditch all your evil conspiracy theories.

Now your so called logic is pretty flawed. There are plenty of techniques out there, old, very old and fairly new, that are very useful, yet never used in practice because even if they use basic functionality may not mean they are immediately obvious.

Real world example: The "Carmack's reversed" shadow volume technique.
Carmack spent a good deal of time developing it. He was all excited about it when he got it working and found out that stencil shadows indeed were practical for real games with this technique. I recall he got a length text file with his thoughts and ideas and some insight in how he came up with the idea posted on nVidias website.
The thing got out on the net, lots of people on opengl.org got excited, and suddenly everyone wanted to implement it. For some time stuff like "how do I get Carmack's reversed working" were the most common topics found on developer forums. Before that everyone used the regular shadow volume techniques.

Then it's found out that this isn't really a new technique. There's a paper from like a year or so before it that implemented it the same way. So Carmack wasn't first.

Now the thing is, when you know about "Carmack's reversed", or the "depth fail stencil shadows" as it's often referred to as to reflect over the fact that Carmack didn't invent it, it's very simple and quite intuitive. But if you have never heard about the technique, it's certainly not the first thing you'll come up with. It's very basic, the amount of code is comparable to this dynamic branching technique, but noone implemented it until Carmack made it popular.

Humus · Jul 3, 2004

euan said:
The thing about being a software developer is that you get a huge buzz when you write something that wasn't immediately apparent, that actually works, let alone works really well. Even if it's as simple as getting a stupid slitter window to work in MFC, it's still the same feeling of acomplishment. That's why people do it, and will continue to do it. If you don't think you've invented something, you'll never get anything out of your job, or hobby.

That's why it's so good to write demos.

I can spend a day and implement something cool, and get all the buzz and lots of positive feedback and raise a lot of discussions.
A university professor may spend several months of research tweaking and tuning his work, writing down proofs and derive, ensuring it works in every possible corner case, and then publish a paper that nobody reads.

With this technique I simply think that's the problem. The papers mentioned in the Far Cry thread are of academic nature, rather than practical "how to tune your game or demo". They are concerned about how to implement some widely generic stuff, not about how to boost gaming performance. I don't think the average game developer have read them.

Warrick · Jul 3, 2004

I'm pretty much amazed by some of the reactions of people in this thread. Well dismayed mostly as poor Humus has been getting some bizarre uncalled for verbal beatings - so much so I felt the need to write this post. He has _just_ made another pretty cool demo that I am sure lots of developers and would be developers alike will find most educational.

As a developer I've known a number of other developers who possess no where near the amount of knowledge that Humus does in these areas so I think certain people may be overestimating the knowledge of the professional game development community at large. nVidia, ATI, and PowerVRs developer relation teams, plus prominent developers are doing a large amount to improve this situation - and so to are enthusiasts (Such as Humus was originally) and discussions on websites such as this which are a great thing.

Of course there are lots of developers who have extremely bright employees who are certainly up to or above speed on these areas but they also find benefit from these kind of demos as they simply usually don't have the time or assign enough priority to spend exploring all avenues of research - as such is the real world.

Also it doesn't matter so much to me who invented or reinvented a technique but more if it can be shown to have practical value to the work I am doing.

I personally have found this demo most interesting as its an option I've considered exploring in the past to help speed up various deferred rendering junk I play with but I simply haven't got around to see if it would make such a difference to make it worthwhile. So from my point of view its great that Humus has provided a reasonably practical example (that performs better than I expected I might add) that proves to me that it is worth spending the time to investigate further - and indeed initial results for me show it has paid off.

As for this wierd 3.0 v 2.0 shader nonsense argument it's not making any sense to me. From my point of view I am only initially bothered with shader 1.1 & shader 2.0 so that the games I work on get to address the majority shader capable market. Then if I have the time (or have been instructed for reasons of company sponsorship marketing deals etc) I'll work on shader 1.4, 2.x and 3.0 optimised paths or whatever but otherwise it doesn't make commercial sense. When we have PC shader 3.0 hardware that has npatch or similar support then I'll be more inclined to add that path as a standard (So 1.1, 2.0, 3.0).

No more uncalled for Humus bashing please (unless he does something wrong of course

) - the other thread on this subject in the coding forum already has a lot more interesting potential compared to the way this thread has seemingly headed

Proforma · Jul 4, 2004

It is, what it is.

PatrickL said:
Guys, what is going on lately ?

Humus posted a demo with a new technique and then a Nvidiot made 12 posts in 2 pages, with absolutely no technical info but just against it as it could make his IHV less desirable. And now we have 15 pages with mostly garbage about Nvidia/ATI fights again?
Is that forum a sub forum of the Inquirer ?

And when i read The post just above me i wonder if people read carefully because it s not humus that made a bad presentation, he just reacted to all the garbage throwed at him.

I am not an Nvidiot. I am a person who has tried both cards
from both ATI and Nvidia.

Aparrently, there are people on here who are biased towards
ATI and think that SM 3.0 is useless and basically a feature for
Nvidia and thats not true.

So humus makes a hack to get around one of the features of
SM 3.0, well thats great and all, but its just a hack thats not
a really good substitute for the real thing.

When I program for something, I don't want to use hacks because
there could always be problems down the road.

This hack for good or bad seems to me a trick to make up for
the fact that ATI made a mistake and that mistake was not
keeping up with technology. Do I want a hack to give me one feature
that I could get without a hack and still get more features (ie object instancing?). No, I want the real thing. The real extension to 2.0+,
which is SM3.0, which is a real direct X 9 feature and its officially not
a hack or something for ONLY Nvidia.

I will wait until the end of this year. If ATI does not bring out
SM 3.0, then I will be forced to go back to Nvidia. Its pretty
much that simple.

Now, according to reports on this very Forum, ATI will not bring
out a chip this year with SM30 support and will not do so
until 2005, which at that point I will have already switch back to
Nvidia.

The sad fact is that a lot of you call me an Nvidiot, which is really
a stupid thing to do since I am far more objective than that.

However, when I pay $500 US dollars, I want the latest technology
so that I can use it with my programming. ATI didn't this generation
and really let me down. I want to upgrade from my 9800 Pro, so I am
looking for the next best bet and that currently would be the Geforce 6800 Ultra.

It is what it is guys. I mean come on, ATI added 16-pipelines at the last minute when they found out Nvidia had put 16 pipelines in their product and then on top of that Nvidia also had 128 bit color and SM 3.0 and ATI doesn't and ATI was supposed to be the technology leader here instead of follower. Whe is going on??????

So, yes. In my mind, ATI f*cked up this generation.

Dave Baumann · Jul 4, 2004

You don't "add" 16 pipelines at the drop of a hat. It takes 6 months from design layout to actual chip production these days - the decision to make R420 capable of 16 pipelines was taken in late 2003, not when people were learning about NV40.

digitalwanderer · Jul 4, 2004

Re: It is, what it is.

Proforma said:
I am not an Nvidiot. I am a person who has tried both cards
from both ATI and Nvidia.

Aparrently, there are people on here who are biased towards
ATI and think that SM 3.0 is useless and basically a feature for
Nvidia and thats not true.

So humus makes a hack to get around one of the features of
SM 3.0, well thats great and all, but its just a hack thats not
a really good substitute for the real thing.

When I program for something, I don't want to use hacks because
there could always be problems down the road.

This hack for good or bad seems to me a trick to make up for
the fact that ATI made a mistake and that mistake was not
keeping up with technology. Do I want a hack to give me one feature
that I could get without a hack and still get more features (ie object instancing?). No, I want the real thing. The real extension to 2.0+,
which is SM3.0, which is a real direct X 9 feature and its officially not
a hack or something for ONLY Nvidia.

I will wait until the end of this year. If ATI does not bring out
SM 3.0, then I will be forced to go back to Nvidia. Its pretty
much that simple.

Now, according to reports on this very Forum, ATI will not bring
out a chip this year with SM30 support and will not do so
until 2005, which at that point I will have already switch back to
Nvidia.

The sad fact is that a lot of you call me an Nvidiot, which is really
a stupid thing to do since I am far more objective than that.

However, when I pay $500 US dollars, I want the latest technology
so that I can use it with my programming. ATI didn't this generation
and really let me down. I want to upgrade from my 9800 Pro, so I am
looking for the next best bet and that currently would be the Geforce 6800 Ultra.

It is what it is guys. I mean come on, ATI added 16-pipelines at the last minute when they found out Nvidia had put 16 pipelines in their product and then on top of that Nvidia also had 128 bit color and SM 3.0 and ATI doesn't and ATI was supposed to be the technology leader here instead of follower. Whe is going on??????

So, yes. In my mind, ATI f*cked up this generation.

No, you are an nVidiot in the truest sense of the word and to say otherwise is a denial of reality....it's kind of why I stopped responding to your nonsense.

g__day · Jul 4, 2004

It saddens me to see such an extremely interesting algorithm for mapping alot of SM 3.0 techniques to even better SM 2.0 equivalent code receives such venom!

As I understand it...

Humus has showed there is an OPEN source way for both NVidia and ATi card owners to execute a fairly large set of SM 3.0 techniques in SM 2.0 and get two major benefits:

1) A massively bigger user base including many NVidia and ATi owners; and

2) A significant speed increase - about 3x - 4x increase in FPS it has been shown ON BOTH NVidia and ATi cards!!!

* * *

The second point was a big surprise as one reader found:

the optimization can also be applied to NV3x/NV40 now, just change "dev->SetRenderState(D3DRS_STENCILPASS, D3DSTENCILOP_ZERO);" to "dev->SetRenderState(D3DRS_STENCILPASS, D3DSTENCILOP_KEEP);" and clear the stencil buffer after drawLighting call.

got 80FPS with optimization off and 230-250FPS with optimization on

* * * * * * * * * * *

So the technique is excellent, broad and open source. It doesn't replace all need for SM 3.0 loops - but it surpasses alot of its uses - and its available for all SM 2.0 capable cards today! So call it a very excellent and legitmate bridging technique / algorithm.

The heat is on....

NVidia has pushed Game developers TWIMTBP not to have any bridging techniques at all for its SM 3.0 showcase patches - let alone bridging techniques that run at 3 times the speed on SM 2.0 code paths . This is dynamite for them. It opens SM 3.0 effects for the mass market and it shows SM 2.0 running extremely fast on NVidia or ATi cards - compared to its SM 3.0 equivalent.

Now folk are asking is it fair to pressure game developers to bypass SM 2.0 when 1) there are very few owners of SM 3.0 3d cards (the owner of SM 2.0 cards - its user base must be millions of times larger than users with cards that have full SM 3.0 support) and 2) bridged SM 2.0 actually runs significantly faster than SM 3.0 code?

* * *

Well done Humus for this extremely useful technique!

New dynamic branching demo

JD

Mariner

Mr. Travis

KimB

kihon

Humus

Crazy coder

Humus

Crazy coder

pocketmoon66

KimB

Humus

Crazy coder

pocketmoon66

Humus

Crazy coder

euan

Humus

Crazy coder

Humus

Crazy coder

Warrick

Proforma

Dave Baumann

Gamerscore Wh...

digitalwanderer

g__day

Similar threads