[H]'s take on SM3 pros/cons

Discussion in 'Architecture and Products' started by Sanctusx2, Apr 26, 2004.

  1. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    It doesn't neccessarily need that many vertices, and you can combine displacement mapping with parallax/offset mapping. HL2 is using DM for deformable terrain and they don't use anywhere near that tesselation level.
     
  2. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,452
    Location:
    Budapest, Hungary
    But I suppose that the terrain isn't displaced by an 1K or even 512*512 texture; what's more, the texel-to-pixel ratio is surely closer to the lightmaps' resolution that to the color/bump maps. I've used 256*256 maps for displacing terrain on some of my early 3ds max images as well, but I consider this level of its use to be a bit different :)
     
  3. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    The problem with all of the SM3 discussions is people expect the phrase "Impossible to do without SM3", and that's not the case in the majority of cases. In fact, the vast majority of shaders being used in PS2.0 games can be done with PS1.1-1.4 in 1-2 passes. The major difference is better precision. It's like that in early PS2 games for example, they are still using cube maps to normalize and not 3 instructions, since you get 2x the throughput.

    SM3, like SM2, boils down to a) more efficient at some operations and b) easier to program c) a subset of new algorithms become "feasible"

    Nothing is strictly impossible. With multipass, even a DX7 card can compute any mathematical function that SM3.0 can do.
     
  4. Zeross

    Regular

    Joined:
    Jun 3, 2002
    Messages:
    280
    Likes Received:
    11
    Location:
    France
    I don't know... exposed is a big word but I think that this is the technique that ATI will use internally to compile long shaders. But this is basically the same thing isn't it ? In fact I can't even imagine how F-Buffer would be exposed in an API, I see it as a transparent layer for the programer.

    It is not Sweeney's fault I've heard the same kind of non sense on Doom III. It comes from the PolyBump â„¢... oupps sorry it comes from the Detail Preserving Simplification technique.

    Offset bump mapping, parallax mapping (official name AFAIK), virtual DM : three names for the same technique. You can't blame Sweeney, it seems that everyone has his own nickname to call it ;)
     
  5. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,452
    Location:
    Budapest, Hungary
    I'm quite sure that there was no mention of displacement mapping about Doom3... The only thing ID said was 'renderbump' which isn't technical enough either, though.

    But you're right that 3 different names for the same technique is already too much, and we're surely going to see more :)
     
  6. Zeross

    Regular

    Joined:
    Jun 3, 2002
    Messages:
    280
    Likes Received:
    11
    Location:
    France
    No need to check you're right for sure ;), I was just saying that it's not the first time that you can see people being misleaded : I remember seing some posts on non technical forum stating that Doom III models are 250 000 triangles. While in fact these high res meshes are just used to build the normal map for the in game models which are between 2500 and 5000 triangles. But people tend to forgot the part related to the normal map and all they remember is "Doom 3 characters... 250 000 polygons" :D
     
  7. Scarlet

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    54
    Likes Received:
    0
    Wanna bet?
     
  8. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Bet what? I don't see him making a bettable claim.

    One example is that any conditionals in shaders compiled with 3.0 can use predicates, which can save 1-2 instructions on average for a guaranteed performance win.

    For example, take a very simple conditional

    Code:
    z = x > y ? a + b : c + d;
    
    translation (without registers assigned)

    under SM3.0

    Code:
    setp_gt p0, x, y
    (p0) add z, a, b
    (!p0) add z, c, d
    
    under SM2.0

    Code:
    sub t, x, y
    add z1, a, b
    add z2, c, d
    cmp z, t, z1, z3
    
    Savings: 33% out of 4 instructions. On a 10 instruction shader, it would be 11%. On larger shaders, you'd most likely switch to dynamic branches.

    This means any PS2.0 HLSL shaders (say from HL2) with conditionals, when compiled with PS3.0 profile (and no other code changes by the developer, just a recompile) can get a 33%-10% performance boost.

    So you don't think developers are going to use the D3DX effects framework to compile 3.0 versions?
     
  9. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    I'm sure you know that number of static instructions is generally not a good way to estimate performance...

    1 IPC Arch: 3+ cycles
    2 IPC Arch: 2+ cycles
    2+ IPC Arch: 2+ cycles (assuming predicate can't be used in same cycle which is generally correct for all microarchitectures I am aware of)

    1 IPC Arch: 4 cycles
    2 IPC Arch: 3 cycles
    3 IPC arch: 2 cycles

    So it really depends on the architecture which one is going to be faster. I can tell you that predication adds more than insignificant complications to an architecture. Predication is not a pancea which has been proven by numerous studies. The concept is great, but the implementations leave a lot to be desired.

    Aaron Spink
    speaking for myself inc
     
  10. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Yes, but it still saves work. On a 2 or more IPC architecture, you've saved a shader unit cycle which is freed up to schedule other non-dependant ops.

    Apples to apples, it's still a win vs CMP.

    Let's look at how they would be scheduled

    Code:
    Shader1: 1. SETP 2. ADD   3. ADD
    Shader2: 1. ****  2.***    3.  ***
    
    * = can dual issue another operation from your shader

    Code:
    Shader1: 1. SUB 2. ADD  3. CMP
    Shader2: 1. ADD 2. ***   3. ***
    
    In the worse case, predication would be equivalent, but in the average or best case, you have 3 slots on SU2 you can fill with other ops, whereas on the bottom case, you have only 2 open dual-issue opportunities.

    Predication has been an issue on other architectures in comparison to *real branches*. It has been a issue for compilers, such as on the Itanium, to use it appropriately in conjuction with branch prediction. The studies to which you refer are studies of predication on ILP CPUs in the context of compiler issues, and real branches.

    But on the GPU were are not comparing predication to real branches, in this thread, we are comparing it the CMP operator, which is just a CMOV instruction. I fail to see how the issues with write disablement vs conditional move are covered by these "numerous studies"
     
  11. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
     
  12. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Can you give a specific example? I just don't see how a write disable is going to cause a huge issue on the GPU. It's no different than write masking on destination registers, the only difference is, it's equivalent to a NULL mask, and data dependent.

    Predicates on the GPU can be treated as just conditional write masks, and as such, the performance implications should be about the same vs the CMOV case or destination write masks.

    p.s. I was also talking about IPC per pipe, that's why I demonstrated two shader units in my example.
     
  13. Scarlet

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    54
    Likes Received:
    0
    lol you guys got really carried away. My bet challenge was on the very last line which was something to the effect that the developers have a clear target for 3.0 (and implying the world will therefor be rosy post SM3).

    Observing how everyone loves to diiferentiate, I am merely observing the obvious.

    Whatever nV does, ATI will do them one differently (no, I didn;t mean better, just differently).

    And of course the corollary:

    Whatever ATI does, nV will do them one differently (no, I didn;t mean better, just differently).

    To think the 3D developer world will be united at SM3 does not reflect past history and ignores the fundamental motivations these IHVs have to be different.
     
  14. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Scarlet, you misunderstood the meaning of "clear target" The point is, SM3.0 is a *known quantity* because it requires many features which were optional in PS2.x, thus developers don't have to write shaders and deal with a combinatorial set of different card capabilities. It's much clearer to developer for, vs trying to target 2.x where we have 2.a (NVidia) and 2.b (ATI) and potentially a bunch of other variations.
     
  15. Maintank

    Regular

    Joined:
    Apr 13, 2004
    Messages:
    463
    Likes Received:
    2
    This article is a good laugh on a Monday morning.

    Of course it is Tuesday so I thank you guys even more. Tuesdays are such a pain in the arse :)
     
  16. hstewarth

    Newcomer

    Joined:
    Apr 13, 2004
    Messages:
    99
    Likes Received:
    0
    Are you stating that HLSL 2.0 shader can be recompile on the fly to 3.0 and get the benifits of 3.0 without the developer writing directly to it.

    If so so this is signficant - if means that new hardware like the 6800 that support 3.0 can making speed up shaders written for 2.0. I assume for users to have this ability they have to wait for DX9.0c. Any dates on that - I am assuming it will be available when 6800 Ultras are in store on Memorial day ( According to interview with NVidia ).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...