Finally read NVIDIA's comments (at xbitlabs). It has some interesting information -- for example that tests 2 and 3 are very vertex shader bound. Although I'd say that their complaint about that isn't really justified if we look at 3DMark03 as a test of the graphics card. The number of times skinning is done does seem excessive if NVIDIA is right, although it's indeed the way developers *would like* to work. The alternative that NVIDIA suggests means doing skinning once -- but on the CPU!
I checked them out as well, and then went back to the 3dMark03 white paper to see how they justified the decision to skin in the vertex shaders. Here are the relevant quotations:
[url=http://xbitlabs.com/news/story.html?id=1045073804 said:
Nvidia[/url]]These two tests [games 2 and 3] attempt to duplicate the “Z-first†rendering style used in the upcoming first-person shooter game, “Doom 3â€. They have a “Doom-like†look, but use a bizarre rendering method that is far from Doom 3 or any other known game application. This method makes for an interesting demo, but is so inefficient that no game would ever employ it. This is best exemplified by the shadow calculation method used in these tests. These tests attempt to use shadow technique used in Doom 3 called stencil shadow volumes. This is a multiple pass algorithm that is done for all objects in the scene. The passes in 3DMark03 look like this:
Code:
For every object:
Pass 1 (Early Z)
Skin Object in Vertex Shader
Pixel Shader writes Z, RGB = ambient, and Alpha = perspective Z
For every light:
For every object:
Pass 2 (Stencil Shadow Volume calculation)
Set stencil to increment/decrement
Skin Object in Vertex Shader
Stencil extrusion calculation
No Pixel Shader
Pass 3 (Lighting)
Skin Object in Vertex Shader
Pixel Shader (lighting) write RGB = color
The portion of this algorithm labeled “Skin Object in Vertex Shader†is doing the exact same skinning calculation over and over for each object. In a scene with five lights, for example, each object gets re-skinned 11 times. This inefficiency is further amplified by the bloated algorithm that is used for stencil extrusion calculation. Rather than using the Doom method, 3DMark03 uses an approach that adds six times the number of vertices required for the extrusion. In our five light example, this is the equivalent of skinning each object 36 times! No game would ever do this.
This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs.
It’s unfortunate that 3DMark03 does not truly emulate Doom or any other game by skinning each object only once per frame, caching the skinned result, and using that cached result in the multiple passes required for shadows. This would have been a balanced approach that allows both the vertex and pixel/raster portions of the graphics engine to run at full speed. Designing hardware around the approach used in 3DMark03 would be like designing a six lane on ramp to a freeway in the freak case that someone might drive an earthmover on to it. Wasteful, inefficient benchmark code like 3DMark03 force these kinds of designs that do nothing to benefit actual games.
3DMark03 White Paper said:
When using this kind of stencil shadowing, the developer is left with some options on the implementation. 3DMark03 does as much work as possible in the vertex shaders, since the goal of 3DMark03 is to measure vertex and pixel shader performance in 3D games. Also it is expected that many games with similar technology will have a heavy workload for the CPU doing physics (including collision detection), artificial intelligence and visibility optimizations for example. It is therefore desirable to perform as much as possible on the graphics card in order to offload the CPU.
An alternative implementation would be to give some of the graphics tasks to the CPU, and thereby offloading [sic] the graphics card. The skinning could be done on the CPU, which would reduce the amount of vertex shader tasks. Also, when pre-skinning on the CPU, the characters would not need to be re-skinned for each rendering pass. Then again, skinning is a fairly light vertex shader operation, and with as few characters as in this game test, there should not be much benefit. Also, if there are many characters on screen, more pre-skinned characters would need to be transferred over the AGP bus.
The first thing to note is that Nvidia seems to be missing a PS1.1 pass, namely "light fall-off to alpha buffer," in their analysis; the White Paper says the PS1.1 path requires (1 + 3-per-light) passes while PS1.4 requires (1 + 1-per-light). I don't know enough to say whether the alpha buffer pass just doesn't require vertex skinning or whether Nvidia left it out (perhaps to understate how inefficient emulating PS1.4 with PS1.1 is?).
Next, it's worth noting Futuremark identifies and discusses the exact issue Nvidia goes to such lengths to "expose" and ridicule, in their White Paper which was of course released before Nvidia's complaint. (OTOH they could have expected such an argument from Nvidia and been preemting it.)
Next let's take note of Nvidia's interesting rhetorical trick: explicitly identifying the dynamic shadowing technique in games 2 and 3 with Doom3--as if Doom3 is the only game that will be using similar techniques!--and thus building the implication that any test using a different means to achieve the same result is invalid; after all, if the point of the test is to simulate Doom3, then you should use the same algorithm! Of course, while Doom3 will be the first major game to use this technique for the entire game world, it will obviously not be the last, and 3DMark03 is presumably targeted to simulate the performance tradeoffs that might be used in a game being released a bit later than Doom3.
But really it all comes down to two contradictory assertions:
Nvidia said:
This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs.
vs.
Futuremark said:
Then again, skinning is a fairly light vertex shader operation, and with as few characters as in this game test, there should not be much benefit.
So the question is: are games 2 and 3 mainly vertex-shader limited on current cards? And, if so, how much of this is due to the extra skinning (after all, PS1.1 will still require more passes and thus more geometry ops, even if the skinning is done by CPU and cached--I think)?
Well, we don't have the answers directly, but we do have
this wonderful comparison of the 9700 Pro with PS1.4 turned on and off in drivers. (EDIT: I should give credit to Ichneumon for doing running this very interesting comparison.
)
Game2:
PS1.4 - 30.5
PS1.1 - 24.9
diff - 22.4%
Game3:
PS1.4 - 28.2
PS1.1 - 22.8
diff - 23.6%
Meanwhile, as we know, the number of vertex-skinning operations goes up by around 100% when moving from PS1.4 to PS1.1. And obviously the extra skinning is only a small part of the performance hit you get from running extra passes.
In other words: I don't buy it. If Futuremark says "there should not be much benefit," I'd tend to believe them. That said, the fact that Doom3 has the CPU do the skinning indicates to me that this is the higher performance method
on current/near future hardware. 3DMark03 is targeted at hardware a bit farther out, and they seem to be saying that, in their opinion, vertex shader power at that point will be able to gobble up the extra skinning with no problem.
Especially as that hardware won't be stuck with PS1.1...