1. Why does 3DM2K3 render characters 11 times/frame when no game in the world will do this?
First, the question is slightly confused: geometry is rendered, while "characters" (or more specifically, anything that uses skeletal animation) are skinned. If we're just talking about rendering geometry, then unless this is a corny way of saying "Doom3 will be out of this world", the premise of the question is incorrect. (Incidentally, that may just have been the worst joke I've ever made.
) Doom3, when using its PS 1.1-equivalent path, will also render geometry 11 times in any scene with 5 lights. That's what happens when you have to multipass--you rerender the geometry on every pass. 1 z-buffer pass plus 2 per light equals 11. That's how it works.
Second, it doesn't render or skin anything 11 times/frame under normal conditions. Here's the Nvidia quote being referred to:
[url=http://www.xbitlabs.com/news/story.html?id=1045073804 said:
Nvidia marketing[/url]] In a scene with five lights, for example, each object gets re-skinned 11 times.
Unfortunately, the example is basically meaningless since the average scene has much closer to 2 lights than 5. A simpler way to calculate things is to just look at the rendered poly counts, which Futuremark has provided in the White Paper. GT2 renders an average of 250,000 polys/frame using the PS 1.1 path, and 150,000/frame with the PS 1.4 path; the numbers for GT3 are 580,000 and 240,000 respectively. Now, these numbers are averages: some scenes will have more lights (and thus a larger disparity in the # of passes), and some fewer; but in general, due to the fact that multiple passes means the same geometry is rendered more times, the amount of geometry rendered goes up 66-100% when moving from PS 1.4 to PS 1.1.
As for the amount of vertex skinning, it will also increase 66-100%, but from smaller numbers. After all, skinning is only needed for skeletally animated characters, not world geometry. And skinning is a light workload vertex shader operation, comparable to transforming, which is done on all the vertices. Doom3 does use the CPU for skinning, and this presumably increases performance particularly on hardware that isn't capable of PS 1.4. But, as Futuremark points out, it takes up CPU time that could better be spent on AI, physics, etc.
In the end the only way to really answer this question is to examine Nvidia's assertion:
[url=http://www.xbitlabs.com/news/story.html?id=1045073804 said:
Nvidia marketing[/url]]This approach creates such a serious bottleneck in the vertex portion of the graphics pipeline that the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs.
Are GT2 and GT3 really seriously vertex bottlenecked? Not really, no.
One quick way to see this is to look at the graphs in the new
3DMark03 performance writeup here at B3D. The graphs will seem a bit unusual if you're used to looking at fps graphs; instead of using framerate as the y-axis variable, they use achieved pixel fillrate, i.e. fps*resolution. (The x-axis variable is resolution.) This style of graph is extremely useful to find out at a glance what the bottleneck is in any given situation.
First let's see what being
vertex shader bottlenecked looks like. As you can see, we get a bunch of straight lines radiating out from the x-y origin. That's because vertex shader workload doesn't change as you change resolution, so if it's always the bottleneck your framerate will stay constant no matter what you do to the resolution. Thus fillrate rises linearly with resolution.
Conversely, if you're completely pixel fillrate limited, you get a horizontal straight line; that's because increasing the resolution won't increase your fillrate because, well, you're already fillrate limited like I said. This effect can be seen with the 9500 at high resolutions on several tests including GT2.
Normally, if you lower resolution enough you eventually end up geometry limited (i.e. diagonal straight line), and if you raise resolution enough you eventually end up fillrate limited (horizontal straight line), so a "well-balanced" game or benchmark is one that follows an arc: steeper in the low resolutions, flatter in the high resolutions. In the middle of such an arc, you're neither exclusively vertex nor fillrate limited.
Having said that, let's look at
the results for GT2. All the cards are following a nice arc, except the aforementioned 9500 which becomes fillrate limited above 1024*768. The GF4 cards in particular--since that's what we're really concerned with here, is whiny GF4 owners--scale very nicely, although it can be difficult to see the arc since the scale is smaller down there.
Let's break out the numbers a little bit by comparing each card to its 640*480 performance. The percentages represent the framerate at the given resolution as a percent of framerate at 640*480. Remember that if GT2 were completely vertex shader limited as Nvidia charges, all the numbers would be 100%. (As they are, more or less, if you do this analysis on the Vertex Shader test results.)
Code:
% of 640*480 fps
800 1024 1280 1600
9700P: 78.6 56.3 38.4 28.6
9700: 78.9 56.8 38.4 28.7
9500P: 75.3 52.7 34.4 24.9
4600: 76.2 57.8 41.1 30.8
4200: 82.0 62.5 43.0 31.3
In general, the 4600 is hardly more vertex shader limited than the 9700 Pro, which is to say, not very much at all. The 4200 is a bit more vertex limited, just as the 9500 Pro is a bit less so. Then again, this is to be expected, as the GF4s have to process more geometry on account of having to run more passes. But all 5 cards are pretty close in scaling characteristics, and none of them is anywhere near approaching a situation where, as Nvidia puts it, "the remainder of the graphics engine (texturing, pixel programs, raster operations, etc.) never gets an opportunity to stretch its legs."
As regards the skinning issue, remember that the skinning workload represents only a portion of the overall geometry workload. And note that the test appears quite bandwidth limited (look at how much the 9700 beats the 9500 Pro by; they're exactly the same except for bandwidth), which further hurts the GF4s because multipassing also takes quite a bit more bandwidth (an extra write to and read from the framebuffer).
For more evidence of the same thing, we can look at
this comparison of a 9700 Pro with PS 1.4 enabled/disabled in the drivers. GT2 and GT3 performance each go up about 23% with PS 1.4 over PS 1.1. This is significant, of course, but when you consider first of all that the geometry load increases 66-100% with PS 1.1 only, and second that only a part of the performance hit is due to geometry (the other part being due to the extra bandwidth required), and third that only a part of the geometry hit is due to skinning...we're talking a pretty minor effect here.
All in all, if skinning were moved to the CPU, the GF4 might see perhaps a 5-10% performance increase on GT2/3 relative to PS 1.4 capable cards (but probably on the lower end of that). Meanwhile, you'd be making 3DMark03 less of a GPU and more of a CPU benchmark--which is contrary to one of its stated aims with the new version--and you'd probably be hurting GT2/3 performance on future GPUs.
Why do processor scores differ when exact same setups are being used with the only difference being the graphics card. Case in point GFFXU cpu scores vs. those of the 9700Pro cpu scores. anywhere from 100-200 points in difference.
First off, the difference is 40-50 points, not 100-200. Anyways, the most likely reason for this is that the GFfx's drivers are not as efficient. (Which is somewhat to be expected; after all, it is a new architecture.) Remember,
drivers run on the CPU, so they're competing with the software vertex shading and everything else for CPU time.
Granted PS 1.4 is DX8 and a subset of DX9, but when the only games that use it are those that ship with ATI cards, why use it?
Doom3 is shipping with ATI cards? Kickass!!
(Note: technically D3 doesn't use PS 1.4 or PS 1.1 because it is written in OpenGL; however, the R200 path uses exactly PS 1.4 functionality, and the NV20 path uses exactly PS 1.1 functionality.)
More generally, per-pixel unified bump-mapped specular and diffuse lighting with stencil shadows (ala Doom3)
cannot be done in 1 pass with PS 1.1, PS 1.2 or PS 1.3. Any game with D3-style lighting is going to use PS 1.4 functionality.
Why not use PS 2.0? Because it's not necessary for the effect, and the installed base of PS 1.4-capable cards is a superset of the installed base with PS 2.0-capable cards. And while almost any PS 1.4 effect can be replicated using PS 1.1-1.3 and 2 or 3 rendering passes, a PS 2.0 effect generally can't be replicated using any PS 1.x shader. In this case, the only benefit to be had from moving to PS 2.0 is the use of higher-precision floating point for some of the lighting calculations--and indeed, Doom3 makes use of this, offering "minor quality improvements" in exchange for "a slight speed [dis]advantage".
Surely many similar games will also offer PS 2.0 versions of the effect, but since it will only be running what is essentially a PS 1.4 shader with FP components for a couple calculations, the performance and image quality will be only slightly different from straight PS 1.4. More to the point, all such games will offer a PS 1.4 path (and presumably a fall-back PS 1.1 path), until such time as consumers with DX8 cards aren't worth supporting at all (probably not for 2.5+ years).
Considering 3DMark03 is meant to simulate games released ~1.5 years from now, rather than games available today, the choice to feature unified per-pixel lighting with stencil shadowing on 2 of 4 tests seems very sensible: it is very likely to be the most important rendering technique used in the next generation of graphics-intensive games. Once that choice has been made, the decision to heavily use PS 1.4 has also already been made.
If someone is more concerned with how cards perform running games that are available today, they should
benchmark them with those games. Duh.