malficar said:
I keep getting conflicting answers on a question of mine I would like answered. This is purely from a gamer/user perspective, NOT a coder or developer. I am not interested in things happening on the driver level.
So here's the question. Is there any visible effect in a game that SM 3.0 can produce that SM 2.0 can not with alittle more work? So if I do go with the X800XT over the 6800GT will I be missing 'neat effects', as some have put it.
Thanks in advance.
There's a significant difference between your thread title and your actual question.
The X800 series is not limited to shader version 2.0. It can do shader
version "2.x", and is a "2_b" shader
compiler target.
Version "2.x" in itself doesn't say much, because it is sufficient to exceed a single "2.0" requirement to claim "2.x". You need to look at where and how a chip exceeds version "2.0".
The X800 can execute shaders up to the maximum length DX9 allows for "2.x", that is somewhere between 511 and 1532 operations, depending on instruction mix. This is a significant step up from version "2.0". X800 is still limited to four levels of indirection ("dependent texture fetches"), which is the same as R3xx class hardware, and may be its biggest overall flaw.
(Pixel) Shader version "3.0"'s most significant benefit is dynamic branching, which is primarily useful for skipping over computations that aren't necessary for the current fragment, which obviously saves computational resources (=effective fill rate). Dynamic branches can also, theoretically, lower CPU load because fewer shader changes are needed per frame (state changes in general are expensive in DX9).
How useful this really is depends on implementation. There currently appear to be significant penalties for dynamic branching on NV40. I'm not claiming that this makes it counterproductive generally, but it's again important to note that there are differences between theoretical value of a shader model, and the actual value of implementations.
Otherwise I'd just like to point out that a shader that must execute 511 instructions per fragment is going to run slow as dog, with fillrates in the "less than a Voodoo 1" ballpark. No matter what hardware or shader model you use. This also means that a very long pixel shader can be multipassed without much penalty because multipassing consumes bandwidth and geometry resources, and both of these resources are heavily underused by very long pixel shaders. You'd get a multipass split basically for free, so for pixel shaders
without lots of conditionals, you don't really need SM3.
Re VS3.0: pure nicety, if filters were supported. Unfortunately NV40 does not support filtering of vertex textures and this makes vertex texturing useless IMO. You can just as well use another vertex attribute stream, or indexed constant storage in lieu of a vertex texture. You'd pick one of the two depending on the amount of data in the vertex texture and its fetch locality.
Another vertex level capability of NV40 is the stream frequency divider, marketed as "geometry instancing". Most useful for rendering
many objects with
low polygon counts. Draw calls are expensive in DX, so this allows the chip to, in a nutshell, generate the draw calls itself internally, under restricted circumstances.
I don't think this is very useful. I may be wrong
In summary, NV40 can not render effects that aren't possible at all on X800, but it may very well be more efficient at rendering the same effects once they
a)reach a certain complexity
b)contain many conditionals
and/or
c)require many levels of texture indirection
Re 3Dc (just because it has been brought up): it just (de)compresses two-channel textures in a format suitable for normal maps, it does not enable any new effects. The compression gain is fixed at 50%. While it does save storage space and bandwidth, it is unlikely to enable higer
resolution normal maps to fit on the same card. That would be a sensible claim to make for a 4:1 (or better) compression, but IMO not for "3Dc".