SM3 vs dynamic branching

mapel110

Newcomer
One and a half years we had to wait till both IHVs had SM3. Now there is one IHV with good dynamic branching performance and all are so happy about it and claim that this feature(which is only a part of SM3) will bring us superb, brandnew, enormous effects (only for ATI of course).

So how much will it be the case that we will see that? Is the rest of SM3 so unimportant, that only ATI-Users will see "superb, brandnew, enormous effects" with its superior dynamic branching performance?

(sry for crappy English)

greetz
 
There are certain 3D techniques that rely on good branching performance to get usable performance, so it comes down to the developer deciding whether to implement them, based on a whole bunch of considerations, not least time, money and marketshare.

It's not like D3DPS30CAPS_RLYFCKNFASTBRANCHING exists, though, making the developer's task harder.
 
Rys said:
There are certain 3D techniques that rely on good branching performance to get usable performance, so it comes down to the developer deciding whether to implement them, based on a whole bunch of considerations, not least time, money and marketshare.

It's not like D3DPS30CAPS_RLYFCKNFASTBRANCHING exists, though, making the developer's task harder.
but first there are more "SM3 effects", which can now be implemented with two IHVs having SM3 without dynamic branching?
Or is dynamic braching so important, that the other SM3 Featues ( predication, static branching, arbitary swizzle, gradient instructions, vPos, vFace, etc) are nearly useless?
 
Its all poppycock imo.

SM2 is still able to run 99% of what developers are doing with good speed.

I dont think hardware will be a big deal until it reaches a point where fully dynamic texturing becomes a reality.
 
mapel110 said:
but first there are more "SM3 effects", which can now be implemented with two IHVs having SM3 without dynamic branching?
Or is dynamic braching so important, that the other SM3 Featues ( predication, static branching, arbitary swizzle, gradient instructions, vPos, vFace, etc) are nearly useless?

None of those things are "Effects" they are just process capabilities. All they do is allow for quicker computations or shortcuts.

Really nothing has changed for what artists and game designers can do since the introduction of SM2.

No one is doing anything really interesting with all this anyway. Everyone is just using shaders to make "rusty pipes" Shiny and bumpy. :rolleyes:

What John Carmack is working on is the only really interesting use of some of the modern technology.
 
mapel110 said:
but first there are more "SM3 effects", which can now be implemented with two IHVs having SM3 without dynamic branching?
Or is dynamic braching so important, that the other SM3 Featues ( predication, static branching, arbitary swizzle, gradient instructions, vPos, vFace, etc) are nearly useless?
The others aren't nearly useless. Predication is a potential low-level optimization for limited branching. Static branching is excellent for ease of writing some code. Arbitrary swizzle is a minor optimization that can save an instruction here or there. vPos and vFace are excellent for certain rendering optimizations (particularly in relation to shadow volume generation).

These are all relatively minor, and are mostly going to just give relatively small performance increases (though vPos and vFace can give larger increases).

The big one of the above are the gradient instructions. Like dynamic branching, they offer a new class of algorithms to be made use of. The most obvious use for gradient instructions lies in implementing texture filtering within the shader. This is highly important for anti-aliasing some effects properly. Without gradient instructions, you have to make use of the built-in texture filtering hardware. If that hardware doesn't do the job you need it to do, then you're pretty much SOL without gradient instructions (i.e. surfaces that use your shader will be prone to aliasing).

But gradient instructions typically aren't considered a "SM3" instruction because they are an optional instruction in SM2.x, and implemented by the GeForce FX series. So that just leaves dynamic branching as allowing significantly new types of shaders. The rest of the improvements, while good, pretty much only allow minor programming ease and execution efficiency improvements.
 
Chalnoth said:
The others aren't nearly useless. Predication is a potential low-level optimization for limited branching. Static branching is excellent for ease of writing some code. Arbitrary swizzle is a minor optimization that can save an instruction here or there. vPos and vFace are excellent for certain rendering optimizations (particularly in relation to shadow volume generation).

These are all relatively minor, and are mostly going to just give relatively small performance increases (though vPos and vFace can give larger increases).

The big one of the above are the gradient instructions. Like dynamic branching, they offer a new class of algorithms to be made use of. The most obvious use for gradient instructions lies in implementing texture filtering within the shader. This is highly important for anti-aliasing some effects properly. Without gradient instructions, you have to make use of the built-in texture filtering hardware. If that hardware doesn't do the job you need it to do, then you're pretty much SOL without gradient instructions (i.e. surfaces that use your shader will be prone to aliasing).

But gradient instructions typically aren't considered a "SM3" instruction because they are an optional instruction in SM2.x, and implemented by the GeForce FX series. So that just leaves dynamic branching as allowing significantly new types of shaders. The rest of the improvements, while good, pretty much only allow minor programming ease and execution efficiency improvements.
ah thanks for that detailed answer.
 
I'd say Chalnoth summarized it quite well, but I think we should make the distinction between SM3 and PS3, since nearly everything mentioned here was about the latter. Personally, I don't think a shader should be called PS3 if it doesn't have dynamic branching, because that's the really big leap that PS3 makes. If a game used the Doom3 engine but didn't use any bump mapping or dynamic shadows, it would be pretty pointless.

SM3 also includes VS3, which has the very important feature of vertex texturing. Unfortunately, vertex texturing is also very slow on NV40/G70 (I heard less so on the latter, but nothing quantitative to back it up), so in a sense its usability is similar to that of dynamic branching: nice to have the feature for development, but it'll rarely benefit gamers.

In the end it doesn't really matter that only ATI users will see new effects from real PS3 shaders. NV users already got their advantage over the PS2 generation with FP blending. Sometimes I wish ATI didn't put dynamic branching in their architecture, and just did R420 times 2 or even 3, tacked on FP blending, and used their new memory controller. But on the other hand, the path they did choose will hopefully advance the field and spur the creation of new effects.
 
Mintmaster said:
I'd say Chalnoth summarized it quite well, but I think we should make the distinction between SM3 and PS3, since nearly everything mentioned here was about the latter. Personally, I don't think a shader should be called PS3 if it doesn't have dynamic branching, because that's the really big leap that PS3 makes. If a game used the Doom3 engine but didn't use any bump mapping or dynamic shadows, it would be pretty pointless.

SM3 also includes VS3, which has the very important feature of vertex texturing. Unfortunately, vertex texturing is also very slow on NV40/G70 (I heard less so on the latter, but nothing quantitative to back it up), so in a sense its usability is similar to that of dynamic branching: nice to have the feature for development, but it'll rarely benefit gamers.

In the end it doesn't really matter that only ATI users will see new effects from real PS3 shaders. NV users already got their advantage over the PS2 generation with FP blending. Sometimes I wish ATI didn't put dynamic branching in their architecture, and just did R420 times 2 or even 3, tacked on FP blending, and used their new memory controller. But on the other hand, the path they did choose will hopefully advance the field and spur the creation of new effects.
but some developers say, that even on the R520 dynamic branching is quite useless. there are rare situations where it can be used for acceleration (on NV40/G70 nearly none).
 
mapel110 said:
but some developers say, that even on the R520 dynamic branching is quite useless. there are rare situations where it can be used for acceleration (on NV40/G70 nearly none).
If they're talking about accelerating their current shader library, then yeah, it's pretty useless.

But there are a lot of other techniques that benefit immensely from it. Soft shadows through shadow maps is one that will be very widely used within a year, I think. Good parallax mapping (with occlusion & shadowing) is another. There are plenty more as well, but they're not likely to be adopted by developers, though.

Like I said before, I see lots of room for realtime graphics advancement even without dynamic branching, so I still think it's a bit too pricey in terms of the transistor cost. Those particular developers (I'd like some links to those quotes if you have them) may be thinking along the same lines as I am.
 
I think most of the interesting things are about fleshing out the idea behind shader models: offering a more general purpose approach to positioning and coloring the right pixels the right way, to developers. The thing that makes dynamic branching interesting at the moment is simply more complex shaders, that perform faster. But in the longer run, it allows things like libraries, procedural effects/textures and middleware to be used, thereby giving the developers not only much more options, but a much easier time implementing them as well.

But we have to wait for DX10/SM4 for the really nice things, like being able to create geometry, virtual memory and removing the state change/texture fetch bottlenecks. That will really enable the developers to have the GPU create the objects you see fully procedural on the GPU.

Then again, that would require a different approach to creating games as well: you don't create great, fully independent artwork up front and only ask the engine to display it as intended, but you want to be able to describe that artwork and have the engine create it. That won't be easy for the artists and require some very good new tools.
 
But what about the speed of dynamic branching in R520? Is it that fast, that you can save a lot GPU-power? or other way, can you use it very often in one shader?

I read, that dynamic branching in the vertex shader is very slow on R520. Is that less important than branching in the pixel shader?
 
Wouldnt dynamic branching help the efficiency in a unified shader design much like what ATI is doing with Xenos (more or less)? ATI's with their future unified shader design (progression of enhanced memory controller etc.) not to mention some of the DX10/SM4 implementations Diguru mentioned?

Although dynamic branching isnt used heavily right now as stated above it might be a different story when say DX10/SM4 arrive. Although NV and ATI will probably approach it in different ways in their future architectures.
 
jpr27 said:
Wouldnt dynamic branching help the efficiency in a unified shader design much like what ATI is doing with Xenos (more or less)?
I don't think you can say that. Rather, the things that would make a unified shader design highly efficient are very similar to the things that would make a dynamic branching design very efficient. So, if you already have an architecture that supports one or the other efficiently, then the leap to the other is smaller.
 
Dynamic branching is "more efficient" in R520 by virtue of the fact that the threads contain less pixels - 16 in R520 versus 64 in Xenos.

Xenos's USA efficiency gains have nothing to do with branching, per se, and everything to do with balancing vertex and pixel shader loads, and ensuring that the minimum of execution pipelines are work-less.

It is notable that R520 and Xenos use very similar scheduling techniques: the fully decoupled texture pipes combined with the ability to arbitrarily sequence execution of threads from a large pool of threads. Obviously unification in Xenos makes thread scheduling more complicated because pixels can't be shaded without completed vertices etc.

Again, R520 would appear to have the upper hand in terms of the pixel shaders: it supports a pool of 512 threads whilst Xenos supports only 64 (for peeps keeping track: I'm now convinced that 63 is an error in the documentation, as I've found another document that states 64 - and 63 always seemed like a silly number).

In other words, R520 has twice as many pixels in flight, 8192, as Xenos, meaning it has more flexibility in scheduling.

Jawed
 
The main problem with all the approaches is that you really want all the individual pixel pipelines (not fragments/quads) to be able to run an independent thread/program, not all of them the same one at any one time. Ideally, you would want a single, massive shader program with branches, and have all the pixels have their own program counter and registers.
 
Why would you want that? The hardware penalty would be ridiculous and even with algorithms like raytracing on KDTrees you get pretty reasonable coherence.
 
Yes.

But if you want to have a more general programming model, to enable the developers to do all the nice things they want, that's the direction you have to go. Which is also what DX10 does.
 
DiGuru said:
But if you want to have a more general programming model, to enable the developers to do all the nice things they want, that's the direction you have to go. Which is also what DX10 does.
There's no way that DX10 specifies that each pixel needs its own instruction counter and whatnot. This is a hardware-specific implementation detail. I don't expect any graphics hardware to have single-pixel granularity on branches, ever.
 
Back
Top