New dynamic branching demo

Discussion in 'Architecture and Products' started by Humus, Jul 1, 2004.

  1. JD

    JD
    Newcomer

    Joined:
    Dec 15, 2002
    Messages:
    122
    Likes Received:
    0
    I think Humus should have removed that nv line from his page and present the demo as is, ie. technical demo instead of marketing demo. I think this is what caused the hostility in the first place. Other than that, the demo is cool and gives those who don't have sm3 capable card to duplicate some of sm3 thru different means. I think Ati will have sm3 capable hw in next 4 to 6 months as they stated. By that time there might be more sm3 games avail. than there are now.
     
  2. Mariner

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,600
    Likes Received:
    237
    If this technique is a viable option to improve performance/compatibility on cards which don't support SM3.0 (in some cases, at least) then it will be up to ATI developer relations to assist developers in implementing such a technique.

    No games featuring SM3.0 are currently available (unless you consider the as yet unavailable FarCry 1.2 patch and beta DirectX 9.0c as a currently available game). Therefore, we simply don't know if any of the SM3.0 games will support the technique as espoused by Humus. Let's wait and see if it is used before trying to claim it is worthless, shall we? :wink:
     
  3. Mr. Travis

    Newcomer

    Joined:
    May 28, 2004
    Messages:
    25
    Likes Received:
    0
    "For a more positive update, after a discussion with CryTek about the new rendering path, we have learned that the lighting model implimented in the SM3.0 Path is exactly the same as was used in the SM2.0 Path. The only exception is that they used the conditional rendering (branching in the pixel shader) to emulate multipass lighting in a single pixel shader. The performance gains we see actually indicate that PS3.0 branching does not have as significant a performance hit as previously thought (and proves to be more efficient than using multiple pixel shaders in a scene)."

    anyway it sounds like far cry is using 3.0 for a similar purpose as humus, so I wonder if his technique could be applied? I'll look through the shaders but I'm not much of a coder
     
  4. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,886
    Likes Received:
    211
    Location:
    Seattle, WA
    It sounds to me more like FarCry had multipass lighting (which is what Humus is doing), and moved to a single-pass technique, which resulted in a performance increase.
     
  5. kihon

    Newcomer

    Joined:
    Dec 4, 2002
    Messages:
    53
    Likes Received:
    0
    Re: programming

    OH NO YOU DONT! Im not falling for that one again :evil:

    Sorry - couldnt resist.
     
  6. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    You mean splitting the first depth+ambient pass into two different passes? Then no, it'll be slightly slower. The ambient shader is pretty cheap, so it'll be a loss doing another pass just for that.
     
  7. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Fair enough, but let me take this in chronological order.

    Once upon a time ... :p maybe 2 or 3 weeks ago, I had a discussion with my co-worker Guennadi at lunch. Since nVidia were making a big deal out of dynamic branching he suggested that I should make an SDK sample as a response to that, maybe I could do early-out by using some clever tricks with stencil? Well, I thought it sounded like it might be possible, but didn't expect much of a performance gain, or any at all, but I put that in my TODO list anyway. On Tuesday this week I began working on it, solved the practical problem of how to control stencil from a shader by using alpha test, got it up and running and began to see that there was a huge performance boost potential. So I got pretty excited about all this, especially when I had all the lighting and everything working and saw like 2-3x performance boost, and I still hadn't tweaked the thing.

    Then came the Far Cry thread, and the topic of dynamic branching. So of course I brought this technique up. At this point I thought it was new, especially since I came up with it without reading any papers on the topic or anything, just developed it through my own work, so of course I presented it sort of as "my technique". Some people pointed out that it wasn't new. A guy mentioned two papers, which I then googled up. I went like "duh, of course", because I was aware of both these papers, but never read them or thought about how they implemented it. Quick check and indeed, they used the same technique. Now, big deal. So it was old, but the whole point was that it works, does the job and is fast. But some people got stuck on the "my technique" part. From that those papers were posted I changed to refer to is as "this technique" as to not steal any credit for inventing it, but it seems people never noticed and continued to make a big deal about it not being new or that I've been dishonest and what not. I mean, the "this is not new" keeps poping up everywhere, even on other forums and comments made on my site. There was never any intent to rebrand existing techniques. I'm just excited that it works so well and I'm more interested in the technical pros and cons than the who and when it was invented.
     
  8. pocketmoon66

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    163
    Likes Received:
    9
    Oops! missed a bit out - wasn't clearing the stencil buffer between passes.
    Demo now runs :


    1280x960
    FALSE: 51
    TRUE: 185 ish <- better
    DB PS2 (cmp) 54
    DB PS3 (if then else) 65 ish

    I'll work on combining the individual light passes into a single SM2/3 pass and see how that works out...
     
  9. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,886
    Likes Received:
    211
    Location:
    Seattle, WA
    Now you just need to see what the performance characteristics would be like in a real game. PS 3.0 shaders in Far Cry increase performance by reducing the number of passes. Your shader increases the number of passes, but reduces total pixel shader processing (over "normal" PS 2.0).

    So, if more geometry was used (a quick hack would be to tesellate the geometry that's there), or longer shaders (more code per branch), the balance might shift back towards PS 3.0.
     
  10. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    It's possible to do with any shader capable hardware, or even on fixed function hardware. As long as it does top of the pipe stencil rejection. On fixed function hardware you may be limited in what if-expressions you can implement, and of course what lighting or effects you can do, but for something like this demo you could actually render the if-statement with any dot3 bumpmapping capable hardware.
    But in general, for it to be practical you'll want to have shaders.
     
  11. pocketmoon66

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    163
    Likes Received:
    9
    Well I *KNOW* I invented shadow buffers, image imposter, volumetric shadows and surround sound, all in my younger days :)

    It's only when you find out that someone else actually thought about your latest 'idea' it in 1981 that you get depressed ;)

    I take it as a sign of intelligence that I reinvented so much in ignorance!
     
  12. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    While it's basic functionality doesn't mean it's obvious usage of it. How long have hardware done early stencil rejection? R100 was first with early Z rejection, but I don't know if it also did early stencil rejection. Using clipping planes or scissor rectangles is basic functionality too, doesn't mean hardware does it fast, or that developers are using it. In fact, clipping planes used to be a pain on GF3/4 since it was implemented with TEXKILL in the fragment shader, and didn't improve things much, and in fact, often forced it into software since each TEXKILL took up one TMU. I don't know if this is still true on GFFX and up or if they are using real geometry clipping planes now.
     
  13. euan

    Newcomer

    Joined:
    Jun 7, 2003
    Messages:
    136
    Likes Received:
    0
    Location:
    Scotland, UK
    The thing about being a software developer is that you get a huge buzz when you write something that wasn't immediately apparent, that actually works, let alone works really well. Even if it's as simple as getting a stupid slitter window to work in MFC, it's still the same feeling of acomplishment. That's why people do it, and will continue to do it. If you don't think you've invented something, you'll never get anything out of your job, or hobby. :D
     
  14. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    It's still the card or the drivers. It just doesn't do early-stencil rejection if you write the stencil buffer. Either that's a hardware limitation, or something with the drivers. But you can work around it, and I sure will update the demo to support it, so you can ditch all your evil conspiracy theories.

    Now your so called logic is pretty flawed. There are plenty of techniques out there, old, very old and fairly new, that are very useful, yet never used in practice because even if they use basic functionality may not mean they are immediately obvious.

    Real world example: The "Carmack's reversed" shadow volume technique.
    Carmack spent a good deal of time developing it. He was all excited about it when he got it working and found out that stencil shadows indeed were practical for real games with this technique. I recall he got a length text file with his thoughts and ideas and some insight in how he came up with the idea posted on nVidias website.
    The thing got out on the net, lots of people on opengl.org got excited, and suddenly everyone wanted to implement it. For some time stuff like "how do I get Carmack's reversed working" were the most common topics found on developer forums. Before that everyone used the regular shadow volume techniques.

    Then it's found out that this isn't really a new technique. There's a paper from like a year or so before it that implemented it the same way. So Carmack wasn't first.

    Now the thing is, when you know about "Carmack's reversed", or the "depth fail stencil shadows" as it's often referred to as to reflect over the fact that Carmack didn't invent it, it's very simple and quite intuitive. But if you have never heard about the technique, it's certainly not the first thing you'll come up with. It's very basic, the amount of code is comparable to this dynamic branching technique, but noone implemented it until Carmack made it popular.
     
  15. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    That's why it's so good to write demos. :)

    I can spend a day and implement something cool, and get all the buzz and lots of positive feedback and raise a lot of discussions.
    A university professor may spend several months of research tweaking and tuning his work, writing down proofs and derive, ensuring it works in every possible corner case, and then publish a paper that nobody reads.

    With this technique I simply think that's the problem. The papers mentioned in the Far Cry thread are of academic nature, rather than practical "how to tune your game or demo". They are concerned about how to implement some widely generic stuff, not about how to boost gaming performance. I don't think the average game developer have read them.
     
  16. Warrick

    Newcomer

    Joined:
    Jun 5, 2003
    Messages:
    17
    Likes Received:
    14
    Location:
    Hong Kong
    I'm pretty much amazed by some of the reactions of people in this thread. Well dismayed mostly as poor Humus has been getting some bizarre uncalled for verbal beatings - so much so I felt the need to write this post. He has _just_ made another pretty cool demo that I am sure lots of developers and would be developers alike will find most educational.

    As a developer I've known a number of other developers who possess no where near the amount of knowledge that Humus does in these areas so I think certain people may be overestimating the knowledge of the professional game development community at large. nVidia, ATI, and PowerVRs developer relation teams, plus prominent developers are doing a large amount to improve this situation - and so to are enthusiasts (Such as Humus was originally) and discussions on websites such as this which are a great thing.

    Of course there are lots of developers who have extremely bright employees who are certainly up to or above speed on these areas but they also find benefit from these kind of demos as they simply usually don't have the time or assign enough priority to spend exploring all avenues of research - as such is the real world.

    Also it doesn't matter so much to me who invented or reinvented a technique but more if it can be shown to have practical value to the work I am doing.

    I personally have found this demo most interesting as its an option I've considered exploring in the past to help speed up various deferred rendering junk I play with but I simply haven't got around to see if it would make such a difference to make it worthwhile. So from my point of view its great that Humus has provided a reasonably practical example (that performs better than I expected I might add) that proves to me that it is worth spending the time to investigate further - and indeed initial results for me show it has paid off.

    As for this wierd 3.0 v 2.0 shader nonsense argument it's not making any sense to me. From my point of view I am only initially bothered with shader 1.1 & shader 2.0 so that the games I work on get to address the majority shader capable market. Then if I have the time (or have been instructed for reasons of company sponsorship marketing deals etc) I'll work on shader 1.4, 2.x and 3.0 optimised paths or whatever but otherwise it doesn't make commercial sense. When we have PC shader 3.0 hardware that has npatch or similar support then I'll be more inclined to add that path as a standard (So 1.1, 2.0, 3.0).

    No more uncalled for Humus bashing please (unless he does something wrong of course :D ) - the other thread on this subject in the coding forum already has a lot more interesting potential compared to the way this thread has seemingly headed :shock:
     
  17. Proforma

    Banned

    Joined:
    Feb 23, 2004
    Messages:
    86
    Likes Received:
    0
    It is, what it is.

    I am not an Nvidiot. I am a person who has tried both cards
    from both ATI and Nvidia.

    Aparrently, there are people on here who are biased towards
    ATI and think that SM 3.0 is useless and basically a feature for
    Nvidia and thats not true.

    So humus makes a hack to get around one of the features of
    SM 3.0, well thats great and all, but its just a hack thats not
    a really good substitute for the real thing.

    When I program for something, I don't want to use hacks because
    there could always be problems down the road.

    This hack for good or bad seems to me a trick to make up for
    the fact that ATI made a mistake and that mistake was not
    keeping up with technology. Do I want a hack to give me one feature
    that I could get without a hack and still get more features (ie object instancing?). No, I want the real thing. The real extension to 2.0+,
    which is SM3.0, which is a real direct X 9 feature and its officially not
    a hack or something for ONLY Nvidia.

    I will wait until the end of this year. If ATI does not bring out
    SM 3.0, then I will be forced to go back to Nvidia. Its pretty
    much that simple.

    Now, according to reports on this very Forum, ATI will not bring
    out a chip this year with SM30 support and will not do so
    until 2005, which at that point I will have already switch back to
    Nvidia.

    The sad fact is that a lot of you call me an Nvidiot, which is really
    a stupid thing to do since I am far more objective than that.

    However, when I pay $500 US dollars, I want the latest technology
    so that I can use it with my programming. ATI didn't this generation
    and really let me down. I want to upgrade from my 9800 Pro, so I am
    looking for the next best bet and that currently would be the Geforce 6800 Ultra.

    It is what it is guys. I mean come on, ATI added 16-pipelines at the last minute when they found out Nvidia had put 16 pipelines in their product and then on top of that Nvidia also had 128 bit color and SM 3.0 and ATI doesn't and ATI was supposed to be the technology leader here instead of follower. Whe is going on??????

    So, yes. In my mind, ATI f*cked up this generation.
     
  18. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    You don't "add" 16 pipelines at the drop of a hat. It takes 6 months from design layout to actual chip production these days - the decision to make R420 capable of 16 pipelines was taken in late 2003, not when people were learning about NV40.
     
  19. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,228
    Likes Received:
    1,747
    Location:
    Winfield, IN USA
    Re: It is, what it is.

    No, you are an nVidiot in the truest sense of the word and to say otherwise is a denial of reality....it's kind of why I stopped responding to your nonsense. :roll:
     
  20. g__day

    Regular

    Joined:
    Jun 22, 2002
    Messages:
    580
    Likes Received:
    1
    Location:
    Sydney Australia
    It saddens me to see such an extremely interesting algorithm for mapping alot of SM 3.0 techniques to even better SM 2.0 equivalent code receives such venom!

    As I understand it...

    Humus has showed there is an OPEN source way for both NVidia and ATi card owners to execute a fairly large set of SM 3.0 techniques in SM 2.0 and get two major benefits:

    1) A massively bigger user base including many NVidia and ATi owners; and

    2) A significant speed increase - about 3x - 4x increase in FPS it has been shown ON BOTH NVidia and ATi cards!!!

    * * *

    The second point was a big surprise as one reader found:

    * * * * * * * * * * *

    So the technique is excellent, broad and open source. It doesn't replace all need for SM 3.0 loops - but it surpasses alot of its uses - and its available for all SM 2.0 capable cards today! So call it a very excellent and legitmate bridging technique / algorithm.

    The heat is on....

    NVidia has pushed Game developers TWIMTBP not to have any bridging techniques at all for its SM 3.0 showcase patches - let alone bridging techniques that run at 3 times the speed on SM 2.0 code paths . This is dynamite for them. It opens SM 3.0 effects for the mass market and it shows SM 2.0 running extremely fast on NVidia or ATi cards - compared to its SM 3.0 equivalent.

    Now folk are asking is it fair to pressure game developers to bypass SM 2.0 when 1) there are very few owners of SM 3.0 3d cards (the owner of SM 2.0 cards - its user base must be millions of times larger than users with cards that have full SM 3.0 support) and 2) bridged SM 2.0 actually runs significantly faster than SM 3.0 code?


    * * *

    Well done Humus for this extremely useful technique!
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...