DemoCoder said:
I'm waiting for a comprehensive set of comparisons (maybe Dave can run the tests with Det45 vs Det52), but it looks to me like it is legitimate. And it makes sense if you read their Unified Compiler Whitepaper where they show before and after instruction schedules.
I've read a few "whitepapers" from nVidia over the years that I could not distinguish from thinly veiled PR jargon designed to promote some various aspect of their products as a marketing exercise. The ones relative to 3dMk03 and their "critiques" of various "unnecessary" so-called DX9 hardware features come to mind, as recent examples.
When they spent all that time and money figuring out how to thoroughly cheat 3dMk03 in their drivers earlier this year to the degree that the benchmark provided almost no meaningful information on their GF FX products (merely one of many such examples), evidently "the problem" *then* according to the nVidia PR machine was with "software developers using the wrong approach" and everything would be "fine" if only "software developers" would optimize for nV3x, etc. To me the current emphasis on compilers isn't new at all from a technological standpoint, nor particularly important at this stage of nV3x deployment, but in terms of marketing it is just a new chapter in the same old tired PR story trumpeted all year along: an ongoing attempt to explain away the performance deficit of nV3x in relation to the chips offered by their competitors (specifically, of course, R3x0.)
Well, I have been discussing it since the NV3x launch, like a broken a record, I might add, yet after many many shader-benchmark tests people were coming to the conclusion that there was nothing more to be done because of the NV3x architecture. (and of course, with the usual disclaim that a technical discussion on NV3x architecture and optimization opportunities does not constitute a "defense" of NVidia or anti-ATI position for fanboys)
For some reason you seem to think nVidia's remarks on the subject of nV3x compilers do not constitute mere marketing for nV3x, but are objective, unbiased statements of some sort of universally accepted principles. Considering that nVidia makes all of the statements it makes in reference only to nV3x, and doesn't mention its competition unless it is to disparage the competition's products, I don't see how you might fail to reach the conclusion that nVidia itself is the biggest nVidia "fanboi" of them all. Large grain of salt, therefore, is required when self-interested companies obsessed with marketing write "whitepapers," IMO.
I'm not interested in the Det52's from the aspect of "well, I might want to buy a 5700 now". I am interested in them from the aspect of "What went wrong?"
It appears NVidia's problems are the result of trying to design a shader pipeline to be too flexible (resource sharing, extremely long lengths, predicates, ddx/ddy, unlimited dependent textures, complex instruction set, pure stencil-fill mode, two TMUs, etc) They spent transistors on complexity, which in itself, led to poorer performance, but in doing so, the added complexity made it much harder for the drivers to translate DX9 instructions efficiently. Both these factors led to really crummy performance, now it appears the latter issue has been resolved, but we're still left with HW that is not up to snuff.
Certainly the hardware is not "up to snuff." That's the entire problem. I've had an answer which satisified me to the question of "What went wrong?" for most of this year...

And, of course, it has little to do with compilers.
It looks to me that nVidia designed a dog of a chip architecture for nV30 on many levels, and that the cancellation of nV30, the replacement of nV35 with an IBM-fabbed nV38, an increase in core clocks and ram bus clocks, more driver optimizations than can be tabulated--not even touching on 3dMK03 and that particular can of worms--things like FX12 and Fp16, along with hardware fp32 support too slow to do anything but provide nVidia with technical DX9 compliance-- it all adds up to a clear picture of a company struggling to improve a non-competitive architecture through every indirect means at its disposal, like FAB changes, silicon respins, pcb revisions, driver and compiler optimizations--you name it, nVidia has done everything it can do except that which is most needed and which will make the most difference regarding their current problems: a brand new architecture. nV3x is simply a dog. To say that it is "overly complex" as marketing spin for "poorly designed" seems to advance little of worth to the topic, IMO.
Not really. ATI's architecture seems much more straight forward and tailor made for DX9 input. You don't have register limitations to deal with. You don't have multi-precision. You have clear rules for how to use the separate vector and scalar units. They still have to do translation and scheduling, but the issues aren't as complex. If you listen to Richard Huddy explain how to hand craft shaders, you'll see that it's much simpler. Hand crafting for NV3x is more difficult.
The way I read the above it's simply being "overly complex" in stating the obvious: that ATi simply designed a better architecture in every respect germane to a solid 3d architecture. What--I'm supposed to give nVidia kudos for FX12, fp16, and support for a 3d-gaming-useless fp32, simply because they couldn't--or wouldn't--design a fast fp24 pipeline to cover all the bases? Don't think so...I'm supposed to give them kudos for complexity which is wholly unneeded and contributes nothing to 3d-API support, when the architecture is targeted to a market 98% comprised of people who'd buy those products to run 3d games, who have support for present and *upcoming* APIs in mind? I'll pass, thanks.
The upshot for me concerning the whole issue here is this: could you imagine ATi, or any other IHV at the moment, possibly, under any circumstances whatsoever, saying:
"Next generation for us the challenge will be to emulate as closely as we can the principles we see so well-implemented in nVidia's nV3x architecture because we see in it the future of 3d-chip architecture design. We are so impressed with nVidia's market success of nV3x, it's incredible performance, its powerful support of newer 3d API features, and the incredible image quality, and more, that we feel compelled to adopt what is obviously a new 3d-chip-design paradigm, and the wave of the future. We only hope we can be half as successful with our version of the revolutionary nV3x architecture paradigm."
Of course not, right?...

Only an idiot IHV would want to emulate the mess nVidia has created in nV3x, or want to experience the joy nVidia has experienced over the last year because of it, and accordingly absolutely nobody is going to try and emulate them. The truth as we all know it is that practically every comment made in the absurdity I fabricated above is false as it pertains to nV3x. And that is precisely why such "principles" as you apparently see in nV3x that you believe are worthwhile and represent "design directions for the future" are in fact nothing of the sort, IMO. nV3x is simply a non-competitive 3d architecture, which explains everything and is the actual truth of the matter, in my opinion.
That's what bothers me about your assesments that nV3x represents some kind of "future" for 3d chip design in world where IHVs need a solid year to optimize compilers because the chips are *far more complex than they need to be to do the job.* The opposite seems much more likely to me, that IHVs would view nV3x as a prime example of how not to design a 3d architecture for the 3d-gaming market segment in the future. Sure, it's true that chip performance is moving away from external factors like bandwidth and into the vpu itself in relation to performance and IQ with such technologies as pixel shading. So, OK then, the R3x0 does all of that much better than nV3x, and it's nV3x which is actually far more dependent on core and ram clocking than R3x0-based products currently, yet R3x0 still manages to outperform nV3x, and sometimes quite substantially, especially in the implementation of core-dependent technologies like ps2.0 as supported in R3x0 for DX9 API support. So how on earth could anyone ever reach the conclusion that the design paradigm of nV3x was the "future" and the paradigm of R3x0 is not? The facts would seem to indicate the very opposite, seems to me.