What I find so fascinating about all this isn't wether the NV30-specific codepath uses this or that shader precision mode or wether ATI or Nvidia has the more powerful shader implementation... No! I was expecting the NV30 to eat the R300 alive on mere basis of its vast fillrate advantage and Doom3's heavy use of stencil volumes! (And also, Nvidia's claim of a 40% advantage on their own proprietary map played some part in that also I might add.)
By the time the game actually lays down any pixel shader stuff at all, it's already done a Z-only pass so there'll be no overdraw (except for transparencies of course, but those are handled in a different manner from opaque surfaces it seems that probably is less costly), so I didn't expect pixel shaders to be much of a limit at all as far as performance is concerned, and anyway, Doom3s pixel shaders aren't very complicated. PS1.4 can do them in one pass after all, so with a 2.0 implementation I figured it would be a breeze.
On the other hand, we've seen what an enormous performance-hit stencil shadows brings just in a game like Q3A where only characters have such shadows (and they don't have many polys at all comparatively). Doom3 uses stencils *everywhere*! NV30 with its high clock speed should burn through those stencils like no tomorrow I assumed. Maybe it's true, except obviously they're not as big a factor in overall performance as pixel shader speed.
Guess that's because the game redraws the entire screen with pixel shaders at least once per light source (more using pre-PS1.4 shaders or god forbid, no shaders at all), and with many sources that'll be quite a bit of shading there. (Probably the reason why the game's so bloody dark, heh heh.) On the other hand, the stencils seems to be rendered just once (well, front and back side of models, but DX9 cards can do both at once I think I read somewhere).
So in the end, it turns out the game is limited in performance in completely different ways than I had anticipated...! Who coulda thunkit!
So is it basically true then, what XBit labs wrote a while back, that NV30 issues just one PS fp op per clock (or two texture reads)? Or how else can this apparent deficiency in performance be explained?
*G*