Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 02-Jul-2004, 04:06   #101
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Zeno
I second what Ruined said. I'd also like to add that this isn't dynamic branching and it isn't a generic replacement for such. It is a special case which requires switching a bunch of render states and sending all the geometry through the pipe again. It's a hack.

I'm not saying it's not a nice technique, but why do you try to mislead people and say that it is more than it is, and that NVIDIA is "pwned"? If this is true then I expect ATI will be pushing this technique in the future and scrapping development of any SM 3.0 hardware now that you've single-handedly made it obsolete, right?
I respect your opinion Zeno, but this is actually not just a hack. It is that generic. It is a well-known fact that with only minor extensions to the GL you can implement a full shading language. That of course doesn't mean you shouldn't improve hardware. I'm not saying ps3.0 is useless, like nVidia have done in the past with so many features and later found themselves in the awkward situation of lacking a valid reason why they are supporting these features in later generation when they supposedly were useless. It's one thing to say that something is useless (which I'm not going to do), and something entirely different to say "here's an alternative".

This technique can replace most common dynamic code paths, or at least those that we will see in the near future. It won't do while-statement in the general case. But it will do any kind of if-statement, including nesting, and only run the correct path. For performance optimization, this is the most common use of dynamic branching. So a technique that can replace that is IMHO significant.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:06   #102
Richteralan
Member
 
Join Date: Jan 2004
Location: Oberlin, OH
Posts: 258
Default

Quote:
Originally Posted by radar1200gs
Quote:
Originally Posted by Mr. Travis
holy crap...

9500 pro - 1024 x768

false - 20-30 fps avg
true - 70-100 fps avg


million dollar question... could this be applied to current game's lighting like far cry, or forth coming doom 3 for a speed up?
IMO this only demonstates how poor ATi drivers still are.
The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.
ugh....I don't think so.. :?
Richteralan is offline   Reply With Quote
Old 02-Jul-2004, 04:09   #103
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Ruined
And now that Humus works for ATI, no matter what he or anyone else says, that makes him biased towards them.
Of course I am. Who do you work for yourself? Since you got so angry that you wrote half of your total of 62 posts in this thread in a couple of hours it makes you wonder. In any case, you seem to be firmly seated on the green side.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:12   #104
Dio
Senior Member
 
Join Date: Jul 2002
Location: UK
Posts: 1,758
Default

Quote:
Originally Posted by Zeno
It is a special case which requires switching a bunch of render states and sending all the geometry through the pipe again.
Remember that there is no point in optimising if the optimisation doesn't help you.

For example, if you're running a 200-instruction pixel shader, there's no point jumping through hoops to avoid using floating point buffers, because the pixel shader takes so long to execute that the extra bandwidth taken is irrelevant.

Similarly, if you are limited by your pixel shader, then it's a lot less likely to matter if you send all the geometry through many times (if they don't share resources).

In contrast, if your average polygon size is 2-3 pixels, then it doesn't really matter if you run a hundred instruction pixel shader on it. It won't cost anything.

It's all about pipeline balances.
Dio is offline   Reply With Quote
Old 02-Jul-2004, 04:13   #105
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Ruined
I could ask the same thing - do you have proof that a game as complex as Far Cry would work? Simple offset mapping demos have been made for SM2.0 but nothing on the scale of Far Cry.
It would as long as stencil doesn't need to be preserved (does Far Cry even use stencil?). And in that case it could be done by creating another depth-stencil surface.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:17   #106
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Zeno
The technique is fine, Dig, and I would congratulate him for it if it were presented as "New technique to reduce per-fragment calculation while rendering multiple lights under SM 2.0".
That's just one way to show off this technique. It's not limited to this particular application of range-limited lights. It can do any combination of ifs and still do only the same amount of shader work as if ps3.0 dynamic branching would have been used.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:21   #107
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,292
Default

Quote:
Originally Posted by radar1200gs
IMO this only demonstates how poor ATi drivers still are. The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.


You do of course realize that this makes absolutely no sense, right?
digitalwanderer is offline   Reply With Quote
Old 02-Jul-2004, 04:22   #108
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Ruined
You consider releasing a simple graphical demo and having no game in existence showing any support whatsoever of this technique a "turnaround"?
It's hard to turn something around that hasn't started yet, such as the flow of ps3.0 games.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:28   #109
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Ruined
The branching effect also exhibits the 10% slowdown on 6800 series cards. Nvidia cards get a performance *hit* when Humus' "branching" is enabled, not a gain. Has nothing to do with the card, but the coding.
Well you can just put that BS right back where the sun don't shine.

Do you have any proof of a flaw in the coding? It's open source. I challenge you to find any flaw or anything dishonest in there. I got nothing to hide, but you got something to prove.

In fact, I believe that this IS the card. It makes a lot of sense. That's the amount of slowdown I would expect for a card that doesn't perform top of the pipe stencil rejection. I didn't know if they did or not, now I know they don't. Either that, or or there's a problem with their drivers that leaves it disabled.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:30   #110
radar1200gs
Guest
 
Join Date: Nov 2002
Posts: 900
Default

Quote:
Originally Posted by digitalwanderer
Quote:
Originally Posted by radar1200gs
IMO this only demonstates how poor ATi drivers still are. The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.


You do of course realize that this makes absolutely no sense, right?
On ATi hardware When you use dynamic branching without the optimisations/bypasses you get a low framerate. When you bypass the branching with what humus wrote it speeds up. Conclusion - Something is wrong with ATi's dynamic branching at the driver or hardware level.

On nVidia when you use dynamic branching without the optimizations it is faster than when the optimization is enabled. In other words the hardware is already efficient at dynamic branching.
radar1200gs is offline   Reply With Quote
Old 02-Jul-2004, 04:32   #111
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Ruined
If it was there was obviously no attempt to work around it or test it, or get it working etc... His last demo also had problems on Nvidia cards. This furthers the idea of being ATI biased
And you can shove this BS where it belongs too.

Instead of looking for evil explanations to everything, why don't you look for a logical one? Such as that I don't have any nVidia hardware to try it on. If you read my website it says that it's supposed to run on their hardware too. It's been my intent all the way that it should run on all capable hardware. Either there's a driver bug, or there's a bug somewere in my framework that causes the problem. If I had access to any nVidia hardware, I wouldn't hesitate to spend a night to debug it and find the problem AND report any driver bug I find to nVidia.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:35   #112
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Alstrong
Humus, do you have an nv4x? if so, could you code the same demo using SM3.0 for comparison? or am I just talking nonsense and the workaround code you wrote would appear the same for SM3.0...
No I don't. I would make a more indepth comparison if I had. This technique should work on any hardware though, but you'd only see a performance increase if the hardware can reject fragments early based on the stencil test, otherwise you'll see a small performance loss.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:36   #113
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,292
Default

Quote:
Originally Posted by radar1200gs
On ATi hardware When you use dynamic branching without the optimisations/bypasses you get a low framerate. When you bypass the branching with what humus wrote it speeds up. Conclusion - Something is wrong with ATi's dynamic branching at the driver or hardware level.

On nVidia when you use dynamic branching without the optimizations it is faster than when the optimization is enabled. In other words the hardware is already efficient at dynamic branching.
You're just soooo darned close to the right answer Radar that I won't even laugh at you, it's just like you got a positive/negative sign reversed somewhere.
digitalwanderer is offline   Reply With Quote
Old 02-Jul-2004, 04:39   #114
Richteralan
Member
 
Join Date: Jan 2004
Location: Oberlin, OH
Posts: 258
Default

So still nobody answered me why. :?
Richteralan is offline   Reply With Quote
Old 02-Jul-2004, 04:40   #115
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by Mr. Travis
holy crap...

9500 pro - 1024 x768

false - 20-30 fps avg
true - 70-100 fps avg


million dollar question... could this be applied to current game's lighting like far cry, or forth coming doom 3 for a speed up?
Possibly. I don't know if far cry or doom 3 uses range limited lights, and I don't know if there are other forms of early-out possibilities in their shaders. And I guess doom 3 could have some issues since it uses stencil for shadows, depending on how Carmack has structured his code. Also important to remember is that advanced game engines culls large portions of the scene in software prior to rendering, so performance boosts won't be as high as shown in this demo. But that applies equally much regardless if you use this technique or ps3.0.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:41   #116
991060
Member
 
Join Date: Jul 2003
Location: Beijing
Posts: 640
Default

People, stop bashing Humus, will ya?
He just showed sth interesting to us. Yeah, I know the idea has been there for years, but no one mentioned it lately, we nearly forgot there's such an optimization trick. Humus reminds us, and I think it's a good thing. No matter what his purpose is, I repect his coding skill and creativity. As long as this is a valid optimization techniqe, who cares who invented it or implemented it? For me, knowing more is always better than knowing less.
991060 is offline   Reply With Quote
Old 02-Jul-2004, 04:47   #117
ChrisRay
R.I.P. 1983-2010
 
Join Date: Nov 2002
Posts: 2,234
Default

I am not bashing Humus, I am mostly curious as to his comments on whether or not the demo is bugged on NV40 hardware, and why NV40 users are experience a performance deficit from this method.
__________________
Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote
Old 02-Jul-2004, 04:48   #118
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by ChrisRay
Humus do you have any explanation as to why your technique is actually running slower on NV40 Variant graphic cards, and if you intend to investigate the issue.
I would investigate if I could. But the most likely explanation is that they simply don't support top of pipe stencil rejection. So they simply run the whole lighting shader through, then perform the stencil test and reject pixels that are unlit. But at that point all the work has already been done (and wasted), so you end up getting some additional cost of doing it in two passes, but no performance saving from culling fragments early.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:49   #119
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,292
Default

Quote:
Originally Posted by 991060
For me, knowing more is always better than knowing less.
For some that ain't been a good thing lately, it tended to sell less FX cards for 'em...
digitalwanderer is offline   Reply With Quote
Old 02-Jul-2004, 04:51   #120
ChrisRay
R.I.P. 1983-2010
 
Join Date: Nov 2002
Posts: 2,234
Default

Quote:
Originally Posted by Humus
Quote:
Originally Posted by ChrisRay
Humus do you have any explanation as to why your technique is actually running slower on NV40 Variant graphic cards, and if you intend to investigate the issue.
I would investigate if I could. But the most likely explanation is that they simply don't support top of pipe stencil rejection. So they simply run the whole lighting shader through, then perform the stencil test and reject pixels that are unlit. But at that point all the work has already been done (and wasted), so you end up getting some additional cost of doing it in two passes, but no performance saving from culling fragments early.
Thank you. I'll see if I can find out anything
__________________
Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote
Old 02-Jul-2004, 04:51   #121
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by radar1200gs
IMO this only demonstates how poor ATi drivers still are.
The reason nVidia cards don't see such an improvement is because they can already handle the more complex case more efficiently and clumsy optimizations such as this only slow them down unecessarily.
LOL, I guess that was an alternative explanation ... but it's waaaay off. No, all it demonstrates is how poor nVidia's hardware is at stencil rejection.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 02-Jul-2004, 04:52   #122
991060
Member
 
Join Date: Jul 2003
Location: Beijing
Posts: 640
Default

Quote:
Originally Posted by ChrisRay
I am not bashing Humus, I am mostly curious as to his comments on whether or not the demo is bugged on NV40 hardware, and why NV40 users are experience a performance deficit from this method.
The performance drop in NV40/NV3x is most likely due to the lack of early stencil kill ability. And I'm not criticising you, Chris, I know you're much better than that.
991060 is offline   Reply With Quote
Old 02-Jul-2004, 05:14   #123
Jack_Tripper
Member
 
Join Date: Sep 2003
Location: Asslanta
Posts: 180
Default

X800XT
*edit...whoops...didn't have AA/AF on at all...sorry*
16*12
ON 140
OFF 43

10*7
ON 315
OFF 106

Jack
__________________
Jack be nimble
Jack_Tripper is offline   Reply With Quote
Old 02-Jul-2004, 05:14   #124
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,292
Default

Humus, did ATi give ya a pat-on-the-head or a bonus or anything weird like that for this? (Sorry, the cursiousity bug bit me and I had to ask.)
digitalwanderer is offline   Reply With Quote
Old 02-Jul-2004, 05:46   #125
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Outside of the fact that we have the usual people hyping the ATI side of things and the usual people brandishing Nvidia's guns, like Chris Ray I am leary of the extreme difference in the impact of the optimizations on different hardware.

I don't believe that Humus intentionally made the optimization ineffective for Nvidia hardware but I thought the optimization was supposed to be generic? Humus, being the coder you are in the best position to at least make an informed guess at a reason for the discrepancy. Any ideas?

Also, Nvidia is a corporation that has an opportunity to take advantage of features that are unique to their product and they are doing so. The argument that it is Nvidia's 'fault' that Crytek didn't implement these new features in SM2.0 is infantile. It is actually ATI's 'fault' for not implementing SM3.0 support in the first place. Any sensible person would acknowledge that.

Yes it is possible and feasible, but what incentive does Crytek really have to forego SM3.0 and implement fall backs for SM2.0? ATI sure as hell doesn't seem to be giving them any. So Crytek's decision is $$$+3.0 or nothing+2.0 assuming equal effort for either. Yeah, tough decision.

Fact is, Far Cry is a completed retail game that runs perfectly on ATI hardware. Crytek would have never even added these new features without pressure/incentive from Nvidia in the first place and nobody would have seen them on either IHV's hardware. Unless some of you conspiracy theorists think that they intentionally dummied down their SM2.0 features in anticipation of the SM3.0 only addons

The same people that were saying SM3.0 is worthless are the same ones now saying that Crytek/Nvidia are evil for using it. Where does it stop? This is getting ridiculous.
__________________
What the deuce!?
trinibwoy is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dynamic branching, again. Frank 3D Architectures & Chips 8 21-Nov-2004 02:02
Faking dynamic branching - technical discussion Mintmaster 3D Technology & Algorithms 51 08-Jul-2004 15:57
ATI Benchmarking Whitepaper Dave Baumann 3D & Semiconductor Industry 112 27-Oct-2003 19:50
DirectX 9.0 macros and dynamic branching? Luminescent 3D Architectures & Chips 19 20-Aug-2002 17:36
Mark Rein comments on the 'UT2K3' "leaked" demo Aurra_Sing 3D Architectures & Chips 0 11-Jun-2002 22:25


All times are GMT +1. The time now is 18:03.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.