Is this a good summarization of shader model 3?

Caftain · May 17, 2004

Quoted from Azaroth11 and he himself names that he has gathered it from this forum and Dave Baumann here at beyond3d and Demiurg from 3dcenter.de

But the thing is do you think this is a nice summarization on the use and importance of shader model 3 in todays video cards (NV40 and R420)?

To me it made a lot of sense but what do you guys think?

Post 1

PS3 and VS3 (aka Shader Model 3, aka SM3) allow for loops, thus you may execute potentially infinite shader instructions. Right. The problem is, your card won't be able to run long shaders in realtime It won't be FPS but SPF. X800 will run any shader of performance-wise reasonable length. nV's 65k instruction length limit is utterly pointless for many years to come, even X800's current limits are hopelessly overfeatured. So, instruction count is not it.

What is it then? Well, loops are implemented by branching conditionally. A SM3 shader may first calculate if it is reasonable to calculate the pixel value at all (it may be hidden or in deep shadow) and, based on that, either do the whole complex calculation or simply use e.g. black. Otoh branches don't come for free. They have the potential to flush the whole pipeline and, if not used carefully, SM3 shaders may severely decrease performance.

SM3 is SM2 plus some instructions and capabilities plus some limits relaxed. There is no visual difference between PS2 and PS3. You may create any effect in both models, and if you know what you are doing very well, you may make it running faster in SM3, at least in some cases. From a programmer's point of view SM3 is certainly more elegant, though.

And regarding displacement mapping: that's a Vertex Shader 3 feature. It makes a difference, but it is not used in any game, not even the Unreal Engine 3 demo. Real displacement mapping generates additional geometry (i.e. triangles) to be textured, filtered, etc. Even NVIDIA's new cards are only a first shot at it, because they lack a tesselation unit. That means, they can shift vertex positions dependent on texture lookups, but they cannot create vertices out of thin air. As far as I understand, on 6800 cards you may displace vertices of an already fine but flat mesh. So all the geometry must already be there.

What FarCry and UnrealE3 do is called Virtual Displacement Mapping. That's perfectly possible on any SM2 machine, and it doesn't need even many PS instructions. In fact that's the only reason why it can be done in realtime at all. The illusion it creates is quite good as long as you don't look at the surface from a very flat angle. If you do so, you notice that it's really flat but "painted".

The only reason why we probably wouldn't see Virtual Displacement Mapping on Radeons could be, that Crytek implement VDM via SM3 instructions only because they get payed to do so. Otoh, they want to sell their engine, so why should they purposefully p*** off ATI customers? Does not make sense to me.

Hope things are clearer now.

Post 2

Sorry for the "oh my goodness", but, while following the technical discussions on www.beyond3d.com and www.3dcenter.de vividly, I see that NVIDIA's propaganda already works all too well.

As regards instruction count, you will certainly have seen pictures of ATI's Ruby demo. Now, that is code that won't run on R300/R350 any more, at least not without a wrapper that replaces some shaders with slightly simpler ones. Same with nV's Mermaid. Still, those demos use shaders of not much over 100 instructions. With about 1000 instructions you can make "Finding Nemo" and "Final Fantasy" (the movie), and that takes hours per frame on a rendering farm. That's for 65k instructions being pointless in realtime

Some more thoughts about real SM3 displacement mapping:

Have you ever tried to model something in Maya or 3DSmax? Say, a rock? I have. Do you know how many triangles it takes to let something look round? Well, let's suppose we need 100 triangles for a rock like those in the foreground of jakup73's second shot. How many of them do you have on a beach? 10000? Now multipy. That's what real VS3 displacement mapping would have to generate on the fly. That's what would have to be textured, filtered, lit, etc. That's what with NVDIA's current implementation would already have to be there, because the hardware can't create new vertices!!! Impossible.

OK, you could do some LOD magic, i.e. only do it in the foreground, but I guess you would have to do that in the CPU, thereby making an already CPU-bound game even more so. And you wouldn't even be able to do it that way on current NVIDIA hardware, because, you guess, "no new vertices"!!!

From this point of view nV's VS3 seem pretty pointless also, because they don't save triangles, not even in the non-displaced case. That's not entirely true though. While I can't think of things that can't be done in normal VS2 programs, it may be much easier for the developer. I mean, it's a difference whether you have to specify a relief as mathematical opertions or as an image you can paint. The latter may be even done by an artist. Again I think the main advantage of SM3 (at least in nV's restricted implementation) is convenience for the developer, not that it won't run as well otherwise.

All in all it boils down to the question whether developers will use what is most convenient for them or what reaches the biggest group of customers. And that's where money gets in. I can't imagine a publisher not pressing the developers to use SM2 when that's what the masses have and that's where the money is.

In other words: I strongly expect to see SM3-only code in some nV demos, but not in any game before, say, 2006, maybe later. And by that time games will crawl on even the most advanced current hardware anyway.

DeanoC · May 17, 2004

The stuff about VS3 is basically wrong, while is not possible to remove triangles its doesn't really matter.

Reducing LOD can be very cheap (as cheap as just rendering fewer triangles and changing a few constants). For cutscenes with close ups, offset bump mapping (AKA in Unreal speak, virtual displacement mapping) breaks down badly and displacement mapping is a good alternative.

The other VS3 feature is geometry instancing, this is a major advance in allowing much more detail work. In the rock case, all the rocks can be rendered via one call with little overhead in memory, as the big problem with rendering lots of little objects is cost per call, object instancing is a god-send.

So VS3 helps both the small details (rocks, trees, etc.) and the extreme close-ups needed for truely cinematic rendering.

Wether that will make any difference to any game released this year or the next is another matter.

DemoCoder · May 17, 2004

Nice one-sided polemic summary perhaps. If this was a summary written to teach a developer or consumer about the pixel shader, it's a poor one. It glosses over whole areas.

The point of loops isn't to run infinite length shaders, it's to run variable length algorithms. For example, a dynamic number of lights in the scene. Without loops, you need to compile/write shaders that do 1 light, 2 light, 3 lights, etc (remember NVShaderLinker?)

Other uses could be do calculate values which need to be interatively summed for example, until they reach a certain threshold.

And of course, the biggest use of loops is the fact that the code is much easier and simpler to read.

There is no visual difference between PS2 and PS3. You may create any effect in both models, and if you know what you are doing very well, you may make it running faster in SM3, at least in some cases. From a programmer's point of view SM3 is certainly more elegant, though.

There are some shaders which can't be done as easily or cheaplly in SM2 (or not at all): those that require gradients, texldd, the vFace register (not exposed by 2.0b), or the vPos register for example.

1. the tesselation problem isn't that big. Pre-tesselated meshes work well, and will most likely be used on 90 degree surfaces (terrain, water, walls, etc) not "rocks"

2. it's not entirely accurate to say that vertices can't be created. You can use render-to-vertex-buffer to achieve it. In fact, subdivision surfaces have been implemented on the GPU using this technique. It wouldn't make sense to use this on everything.

3. It's more likely that real displacement mapping would be used in addition to virtual displacement mapping, not instead of it.

4. Vertex texturing is very nice to have for simulations, like cloth/wave physics, etc.

If you want a "proper summarization" why not gather information from Pro and Con sources, and ask developers themselves?

Caftain · May 17, 2004

Most developers are in TWIMTBP programme or GITG programmes. And I get great replies from here

And how do I know who to thrust? If I knew all about it I wouldnÂ´t ask about it. And of course programmers probable love shader model 3 since it may make life easier on them but for me as a gamer how big of an advance is it with todays games and upcoming ones?

I am not talking Unreal 3 because that is 2006 and though it look absolutely stunning the NV40 can in itÂ´s current state not run it very well and the X800XT even worse I guess because itÂ´s lack of shader model 3 support...

But all the stuff you mention here how much of a performance hit do it take as games we have today which are based around PS 1.1 and 2.0?
It sometimes sounds like you can run mega advanced shaders without a performance penalty but somehow I find that hard to believe.

And though it can and will run more efficient than PS 2.x hardware do they have the power for it still?

KimB · May 18, 2004

DemoCoder said:
1. the tesselation problem isn't that big. Pre-tesselated meshes work well, and will most likely be used on 90 degree surfaces (terrain, water, walls, etc) not "rocks"

Why wouldn't you use "rocks?" I could definitely see a situation where a displacement map is used in combination with a low-resolution mesh that is then tesselated by the CPU for later displacement mapping. This way a game could support dynamic displacement mapping with a little bit of CPU work.

Of course, it would really require some nice analysis to decide whether or not this would be beneficial in terms of performance. That is, does the game really have the CPU time to spare to tesselate the mesh?

3. It's more likely that real displacement mapping would be used in addition to virtual displacement mapping, not instead of it.

But how would you manage this? The lighting of virtual displacement mapping is dependent upon the initial positioning of the surface triangle, so if this is not predictable, the lighting may end up looking wrong. Would it be possible to send the "unperturbed" triangle data to the pixel shaders for virtual displacement mapping?

DemoCoder · May 18, 2004

Chalnoth said:
But how would you manage this? The lighting of virtual displacement mapping is dependent upon the initial positioning of the surface triangle, so if this is not predictable, the lighting may end up looking wrong. Would it be possible to send the "unperturbed" triangle data to the pixel shaders for virtual displacement mapping?

Umm, are we talking about different things? I'm talking about offset mapping/parallax mapping, not View Dependent Displacement Mapping. All you need for parallax/offset mapping is the eye vector in tangent space, which is calculated in the vertex shaders. I was using the phrase "Virtual Displacement Mapping" to refer to this, not to the other VDM technique.

Or maybe you thought by "real displacement mapping" I meant on-GPU tesselation? I'm talking about pre-tesselated meshes + vertex textures.

Simon F · May 18, 2004

Ughhh!

Is this a good summarization of shader model 3?

I'm going to be accused of being overly pedantic but the noun from "summarise" is "summary".

As for the actual content, well, 3.0 is definitely better. The author is trying too hard to criticise it.

Zeross · May 18, 2004

As regards instruction count, you will certainly have seen pictures of ATI's Ruby demo. Now, that is code that won't run on R300/R350 any more, at least not without a wrapper that replaces some shaders with slightly simpler ones. Same with nV's Mermaid. Still, those demos use shaders of not much over 100 instructions. With about 1000 instructions you can make "Finding Nemo" and "Final Fantasy" (the movie), and that takes hours per frame on a rendering farm. That's for 65k instructions being pointless in realtime

65536 instructions is maybe overkill but what about restrictions of the PS2b profile ? As far as I know the limit of 4 texture indirections still persists and it's already a problem for some of my shaders given I'm not coding games but nervertheless... This kind of restriction is frustrating and is tough to understand especially when you're coding in some kind of HLSL and don't have a deep knowledge of low level hardware limits.

Even me, a regular beyond3D reader

which was fully aware of the R3x0 limits, the first time I wanted to test my program and saw it failed I was kind of confused "What the heck :? ... Hum oh yeah I'm exceeding the number of texture indirections" it's not something you can see easily in your code.

KimB · May 18, 2004

DemoCoder said:
Chalnoth said:

But how would you manage this? The lighting of virtual displacement mapping is dependent upon the initial positioning of the surface triangle, so if this is not predictable, the lighting may end up looking wrong. Would it be possible to send the "unperturbed" triangle data to the pixel shaders for virtual displacement mapping?

Click to expand...

Umm, are we talking about different things? I'm talking about offset mapping/parallax mapping, not View Dependent Displacement Mapping. All you need for parallax/offset mapping is the eye vector in tangent space, which is calculated in the vertex shaders. I was using the phrase "Virtual Displacement Mapping" to refer to this, not to the other VDM technique.

Or maybe you thought by "real displacement mapping" I meant on-GPU tesselation? I'm talking about pre-tesselated meshes + vertex textures.

No. What I mean is that when you attempt to combine displacement mapping and parallax mapping, you're going to be combining two techniques which (effectively) displace the position of the pixels on the screen. Displacing a pixel twice will cause some visual artifacts (unless an efficient way is found to compensate).

DemoCoder · May 18, 2004

The "pixels" are already being displaced by the vertex shader. Whether it is because of vertex skinning, or other animation in the VS, or because the vertex position is between perturbed by a vertex texture lookup, it makes no difference. Any problems you have with parallax mapping because of displaced vertices (via vertex texture) will also apply to vertices displaced by regular ALU vertex shaders.

All you need is the correct texture coordinates, and the eye vector in tangent space. Maybe I'm missing something, but I don't see the problem. (btw, the vertex shader and pixel shader can shader the same heightmap data)

KimB · May 19, 2004

DemoCoder said:
The "pixels" are already being displaced by the vertex shader. Whether it is because of vertex skinning, or other animation in the VS, or because the vertex position is between perturbed by a vertex texture lookup, it makes no difference. Any problems you have with parallax mapping because of displaced vertices (via vertex texture) will also apply to vertices displaced by regular ALU vertex shaders.

All you need is the correct texture coordinates, and the eye vector in tangent space. Maybe I'm missing something, but I don't see the problem. (btw, the vertex shader and pixel shader can shader the same heightmap data)

The assumption is that any polybump technique will take into account the displacement of the vertices. I can see how this would be relatively easy for static models, but there will likely be problems for any technique that involves vertex perturbation.

For character models, where vertex skinning would be common, I don't see it as as much of a problem, as the model is already pretty chaotic (and frequently in motion), and so it isn't as necessary to worry so much, as the problems won't be as noticeable.

But for a world model, which is much more likely to be regular and static, these problems are much more likely to crop up.

DrawPrim · May 20, 2004

DeanoC said:
The other VS3 feature is geometry instancing, this is a major advance in allowing much more detail work. In the rock case, all the rocks can be rendered via one call with little overhead in memory, as the big problem with rendering lots of little objects is cost per call, object instancing is a god-send.

FYI
Instancing will also be supported for VS 2.x and 1.x shaders (on SM 3.0 hardware). As of right now MS is claiming that you can only use this feature for hardware that supports SM 3.0, but there is no reason this feature needs to be tied to SM3 hardware. Plenty of 2.0 hardware can handle this if MS were to expose it as a seperate cap bit. I hope they see the light and expose this.

KimB · May 20, 2004

What SM2 hardware, specifically? As far as I know, no current SM2 hardware could support it, even if Microsoft allowed.

Xmas · May 20, 2004

DrawPrim said:
Instancing will also be supported for VS 2.x and 1.x shaders (on SM 3.0 hardware). As of right now MS is claiming that you can only use this feature for hardware that supports SM 3.0, but there is no reason this feature needs to be tied to SM3 hardware. Plenty of 2.0 hardware can handle this if MS were to expose it as a seperate cap bit. I hope they see the light and expose this.

Will be supported? Another change in DX9.0c?

DeanoC · May 20, 2004

The whole Instancing API has been overhauled for Dx9.0c. A very new thing is that if the device support VS3.0 than you can use the freqency API on all draw prim's regardless of the vertex shader version your actually using (including Fixed function).

I also have never heard of a non VS3.0 device that could do frequency division, if it were so I'd imagine the relevant IHV would have a) an OpenGL extension and B) argued pretty hard for a new cap bit in Dx9.0c.

WaltC · May 20, 2004

I prefer the summation recently supplied by the nVidia chief JHH for ps3.0:

"[ps3.0 support] cost us 60 millions of transistors."

Then there's also what he didn't say about it, but which may fairly be assumed, I think:

"It's hurting our yield picture for the first decent ps2.0-capable chip we've been able to design, as well."

So far, these are the tangible, observable effects of ps3.0 in the 3d-chip markets to date...

KimB · May 20, 2004

WaltC said:
I prefer the summation recently supplied by the nVidia chief JHH for ps3.0:

"[ps3.0 support] cost us 60 millions of transistors."

And this is the sort of advancement we should expect from a next-generation graphics chip. Not the crappy "turbocharged R300" we've gotten from ATI. Of course you can always build a higher-performance part that is based upon the previous generation's programming features, but this holds technology back.

After all, if graphics chip manufacturers had gone for speed first in the past, we could by now easily have a 32-pipeline chip with fewer transistors than the R420 or NV40. Heck, you could even make it TBDR to keep those pipelines full.

But games wouldn't look nearly as good.

It's the advancements in programming flexibility that keep games moving forward.

Dave Baumann · May 21, 2004

Chalnoth said:
What SM2 hardware, specifically? As far as I know, no current SM2 hardware could support it, even if Microsoft allowed.

You might get some benefits from software support.

And this is the sort of advancement we should expect from a next-generation graphics chip. Not the crappy "turbocharged R300" we've gotten from ATI. Of course you can always build a higher-performance part that is based upon the previous generation's programming features, but this holds technology back.

You might have an argument if we were running at full instruction lengths with full speed, but we're not. With shaders even providing more speed will promote the uptake and use, especially since we're not close to maxing them out.

lyme · May 21, 2004

IMO the most important feature of SM3.0 over SM2.0 is simply dynamic branching. Dynamic branching allows you as a programmer to write shaders in a intelligent way without the extra work to break shaders into multiple parts for multiple optional parts or waiting to do a extra pass so you can use modifyble booleans with static branching.

While with dynamic branching you can take your multipass shaders and combine them into a single complete shader in which you can toggle on and off the optional parts in a single pass. In addition you can turn off parts of the shader that are fringe cases and don't apply rather than writing to a alpha channel.

Simply SM3.0 is a superset of SM2.0, with performance improving features and it makes writing shaders alot easier on the programmer.

http://msdn.microsoft.com/library/default.asp?url=/nhp/Default.asp?contentid=28000410

KimB · May 21, 2004

DaveBaumann said:
And this is the sort of advancement we should expect from a next-generation graphics chip. Not the crappy "turbocharged R300" we've gotten from ATI. Of course you can always build a higher-performance part that is based upon the previous generation's programming features, but this holds technology back.

Click to expand...

You might have an argument if we were running at full instruction lengths with full speed, but we're not. With shaders even providing more speed will promote the uptake and use, especially since we're not close to maxing them out.

That's sort of the point. Think about it. At the end of the life of the GeForce2, were its features fully-used? What about at the end of the life of the GeForce4?

What I'm saying is that at each point in the past where a new architecture generation was released, that company could have put out a higher-performing part by not supporting as many new features. Instead, what we have seen is that at each such juncture, up until now, the highest-performing new part has also included technology improvements (at a new generation).

And so now we have a part that is higher-performing almost exclusively due to the higher clockspeeds allowed by using fewer transistors, because it supports fewer programmability features (and also due to the more mature drivers afforded by not changing the architecture significantly). If this turns out to be a win for ATI, why in the world would nVidia or ATI bother to increase the technology of their parts again? Why not just fall back to PS 2.0 and not ever bother to improve upon it? After all, if games don't use PS 3.0, why would we want hardware that supports it? Nevermind that games can't be made using the new programmbility features until that hardware is available.

Is this a good summarization of shader model 3?

Caftain

DeanoC

Trust me, I'm a renderer person!

DemoCoder

Caftain

KimB

DemoCoder

Simon F

Tea maker

Zeross

KimB

DemoCoder

KimB

DrawPrim

KimB

Xmas

Porous

DeanoC

Trust me, I'm a renderer person!

WaltC

KimB

Dave Baumann

Gamerscore Wh...

lyme

KimB

Similar threads