New dynamic branching demo

Zeno said:
I second what Ruined said. I'd also like to add that this isn't dynamic branching and it isn't a generic replacement for such. It is a special case which requires switching a bunch of render states and sending all the geometry through the pipe again. It's a hack.

I'm not saying it's not a nice technique, but why do you try to mislead people and say that it is more than it is, and that NVIDIA is "pwned"? If this is true then I expect ATI will be pushing this technique in the future and scrapping development of any SM 3.0 hardware now that you've single-handedly made it obsolete, right?

I respect your opinion Zeno, but this is actually not just a hack. It is that generic. It is a well-known fact that with only minor extensions to the GL you can implement a full shading language. That of course doesn't mean you shouldn't improve hardware. I'm not saying ps3.0 is useless, like nVidia have done in the past with so many features and later found themselves in the awkward situation of lacking a valid reason why they are supporting these features in later generation when they supposedly were useless. It's one thing to say that something is useless (which I'm not going to do), and something entirely different to say "here's an alternative".

This technique can replace most common dynamic code paths, or at least those that we will see in the near future. It won't do while-statement in the general case. But it will do any kind of if-statement, including nesting, and only run the correct path. For performance optimization, this is the most common use of dynamic branching. So a technique that can replace that is IMHO significant.
 
radar1200gs said:
Mr. Travis said:
holy crap... :oops:

9500 pro - 1024 x768

false - 20-30 fps avg
true - 70-100 fps avg


million dollar question... could this be applied to current game's lighting like far cry, or forth coming doom 3 for a speed up?

IMO this only demonstates how poor ATi drivers still are.
The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.

ugh....I don't think so.. :?
 
Ruined said:
And now that Humus works for ATI, no matter what he or anyone else says, that makes him biased towards them.

Of course I am. Who do you work for yourself? Since you got so angry that you wrote half of your total of 62 posts in this thread in a couple of hours it makes you wonder. In any case, you seem to be firmly seated on the green side.
 
Zeno said:
It is a special case which requires switching a bunch of render states and sending all the geometry through the pipe again.
Remember that there is no point in optimising if the optimisation doesn't help you.

For example, if you're running a 200-instruction pixel shader, there's no point jumping through hoops to avoid using floating point buffers, because the pixel shader takes so long to execute that the extra bandwidth taken is irrelevant.

Similarly, if you are limited by your pixel shader, then it's a lot less likely to matter if you send all the geometry through many times (if they don't share resources).

In contrast, if your average polygon size is 2-3 pixels, then it doesn't really matter if you run a hundred instruction pixel shader on it. It won't cost anything.

It's all about pipeline balances.
 
Ruined said:
I could ask the same thing - do you have proof that a game as complex as Far Cry would work? Simple offset mapping demos have been made for SM2.0 but nothing on the scale of Far Cry.

It would as long as stencil doesn't need to be preserved (does Far Cry even use stencil?). And in that case it could be done by creating another depth-stencil surface.
 
Zeno said:
The technique is fine, Dig, and I would congratulate him for it if it were presented as "New technique to reduce per-fragment calculation while rendering multiple lights under SM 2.0".

That's just one way to show off this technique. It's not limited to this particular application of range-limited lights. It can do any combination of ifs and still do only the same amount of shader work as if ps3.0 dynamic branching would have been used.
 
radar1200gs said:
IMO this only demonstates how poor ATi drivers still are. The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.
scratchhead.gif


You do of course realize that this makes absolutely no sense, right? :|
 
Ruined said:
You consider releasing a simple graphical demo and having no game in existence showing any support whatsoever of this technique a "turnaround"? :LOL:

It's hard to turn something around that hasn't started yet, such as the flow of ps3.0 games.
 
Ruined said:
The branching effect also exhibits the 10% slowdown on 6800 series cards. Nvidia cards get a performance *hit* when Humus' "branching" is enabled, not a gain. Has nothing to do with the card, but the coding.

Well you can just put that BS right back where the sun don't shine. :rolleyes:

Do you have any proof of a flaw in the coding? It's open source. I challenge you to find any flaw or anything dishonest in there. I got nothing to hide, but you got something to prove.

In fact, I believe that this IS the card. It makes a lot of sense. That's the amount of slowdown I would expect for a card that doesn't perform top of the pipe stencil rejection. I didn't know if they did or not, now I know they don't. Either that, or or there's a problem with their drivers that leaves it disabled.
 
digitalwanderer said:
radar1200gs said:
IMO this only demonstates how poor ATi drivers still are. The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.
scratchhead.gif


You do of course realize that this makes absolutely no sense, right? :|

On ATi hardware When you use dynamic branching without the optimisations/bypasses you get a low framerate. When you bypass the branching with what humus wrote it speeds up. Conclusion - Something is wrong with ATi's dynamic branching at the driver or hardware level.

On nVidia when you use dynamic branching without the optimizations it is faster than when the optimization is enabled. In other words the hardware is already efficient at dynamic branching.
 
Ruined said:
If it was there was obviously no attempt to work around it or test it, or get it working etc... His last demo also had problems on Nvidia cards. This furthers the idea of being ATI biased ;)

And you can shove this BS where it belongs too.

Instead of looking for evil explanations to everything, why don't you look for a logical one? Such as that I don't have any nVidia hardware to try it on. If you read my website it says that it's supposed to run on their hardware too. It's been my intent all the way that it should run on all capable hardware. Either there's a driver bug, or there's a bug somewere in my framework that causes the problem. If I had access to any nVidia hardware, I wouldn't hesitate to spend a night to debug it and find the problem AND report any driver bug I find to nVidia.
 
Alstrong said:
Humus, do you have an nv4x? if so, could you code the same demo using SM3.0 for comparison? or am I just talking nonsense and the workaround code you wrote would appear the same for SM3.0...:?:

No I don't. I would make a more indepth comparison if I had. This technique should work on any hardware though, but you'd only see a performance increase if the hardware can reject fragments early based on the stencil test, otherwise you'll see a small performance loss.
 
radar1200gs said:
On ATi hardware When you use dynamic branching without the optimisations/bypasses you get a low framerate. When you bypass the branching with what humus wrote it speeds up. Conclusion - Something is wrong with ATi's dynamic branching at the driver or hardware level.

On nVidia when you use dynamic branching without the optimizations it is faster than when the optimization is enabled. In other words the hardware is already efficient at dynamic branching.
You're just soooo darned close to the right answer Radar that I won't even laugh at you, it's just like you got a positive/negative sign reversed somewhere. :rolleyes:
 
Mr. Travis said:
holy crap... :oops:

9500 pro - 1024 x768

false - 20-30 fps avg
true - 70-100 fps avg


million dollar question... could this be applied to current game's lighting like far cry, or forth coming doom 3 for a speed up?

Possibly. I don't know if far cry or doom 3 uses range limited lights, and I don't know if there are other forms of early-out possibilities in their shaders. And I guess doom 3 could have some issues since it uses stencil for shadows, depending on how Carmack has structured his code. Also important to remember is that advanced game engines culls large portions of the scene in software prior to rendering, so performance boosts won't be as high as shown in this demo. But that applies equally much regardless if you use this technique or ps3.0.
 
People, stop bashing Humus, will ya?
He just showed sth interesting to us. Yeah, I know the idea has been there for years, but no one mentioned it lately, we nearly forgot there's such an optimization trick. Humus reminds us, and I think it's a good thing. No matter what his purpose is, I repect his coding skill and creativity. As long as this is a valid optimization techniqe, who cares who invented it or implemented it? For me, knowing more is always better than knowing less.
 
I am not bashing Humus, I am mostly curious as to his comments on whether or not the demo is bugged on NV40 hardware, and why NV40 users are experience a performance deficit from this method. :)
 
ChrisRay said:
Humus do you have any explanation as to why your technique is actually running slower on NV40 Variant graphic cards, and if you intend to investigate the issue.

I would investigate if I could. But the most likely explanation is that they simply don't support top of pipe stencil rejection. So they simply run the whole lighting shader through, then perform the stencil test and reject pixels that are unlit. But at that point all the work has already been done (and wasted), so you end up getting some additional cost of doing it in two passes, but no performance saving from culling fragments early.
 
Humus said:
ChrisRay said:
Humus do you have any explanation as to why your technique is actually running slower on NV40 Variant graphic cards, and if you intend to investigate the issue.

I would investigate if I could. But the most likely explanation is that they simply don't support top of pipe stencil rejection. So they simply run the whole lighting shader through, then perform the stencil test and reject pixels that are unlit. But at that point all the work has already been done (and wasted), so you end up getting some additional cost of doing it in two passes, but no performance saving from culling fragments early.

Thank you. I'll see if I can find out anything :)
 
Back
Top