Jawed seems to have an infatuation with overengineered out-of-spec features as his metric for measuring the better architecture. I find this ironic, because for years, Nvidia was criticized for putting forward looking features in their architectures before the market needed them, features which seriously gimped their performance, and gave no real benefit in games for years.
I agree that Jawed has totally whack reasoning in this thread, but I don't agree with you on this point about NVidia. I think it's even weirder to say they were criticized for wasting die-space (and thus performance) for these features. Personally, I may have undermined the value of NV4x's SM3.0 support from the perspective of what can be done, but per clock per mm2 NV4x never looked wasteful. ATI was targetting the T&L equipped Radeon to be released earlier, too.
Nvidia was the first consumer card (besides 3dlabs DCC stuff) that had a fixed-function T&L chip. Didn't help endusers one iota as most games were inlining vertex transforms as C preprocess macros, not as library calls.
T&L helped them tremendously though the lifetime of the NV1x architecture. I'm pretty sure games were using it within a year. Don't forget 3DMark, either.
Then there was the NV30 disaster, designing a chip to support shader features that went beyond PS2.0 in some respects (PS2.0a)
That wasn't what cost them, though. NV30's feature advantages where what, longer programs, unlimited dependent texturing (which it sucked at in shader programs for even a couple levels deep) and predication? Those aren't expensive features. NVidia just messed up the architecture for whatever reason. NV43 blew away NV30 at the same size and bus width.
NV4x introduced VTF, PS3.0 DB, and other stuff for which games really didn't need, and for which a high performance version would consume massive die space (R5xx)
Exactly. NV4x didn't use much die space to implement those features. They ran like crap and were just checkbox items because NVidia didn't want to devote much die space to an unused feature. The somewhat costly feature it did implement was FP blending and FP filtering, and they were very valuable for sales due to PC devs looking for "correctness" instead of visual effect when it comes to HDR.
The NV3x had double-Z and z-scissor, and it's practical consequence was benefit in only one game: Doom.
Only Doom? Maybe for NV3x that didn't matter since the game was released too late, but for NV4x the results from that single game probably cancelled all the victories R4xx had in most other games (which were not by as big a margin, but still numerous). Riddick and Doom's stencil shadows were responsible for nearly all of ATI's reputation of poor OpenGL performance.
I don't think NVidia sacrificed much of their transistor budget for unused features at all. They made very smart decisions about where transistors should be used, and placed a huge priority on performance, particularly where it impacts sales the most. Aside from R3xx/R4xx, I'd say ATI is more guilty of implementing features that cost die space and thus performance in most games at usual settings. Free 32-bit in Rage128, EMBM in R100 (this is a huge cost), Truform and PS1.4 in R200, DB in R5xx, and god knows what in R600.
As for Jaws contention of a "blunt tool" design for G80, I agree with you as to how senseless that claim is. Even more ridiculous is the claim of G80 being unbalanced. It's a hell of a lot more balanced than R600 in almost every respect, especially when you consider marginal cost of the features it excels in.