WaltC said:
It certainly appears to me, though, that the ps2.0 shader performance improvement in the 5700 has much less to do with compiler tuning and much more to do with ripping out the integer units in the chip, if that is indeed the case.
It doesn't appear that way to me, since there are also large improvements on the 5600/5900 using Det45 vs Det 52. What's more, these performance improvements are across the board, not just benchmarks, but on a whole boatload of games (Halo, FFXI, etc).
I'm waiting for a comprehensive set of comparisons (maybe Dave can run the tests with Det45 vs Det52), but it looks to me like it is legitimate. And it makes sense if you read their Unified Compiler Whitepaper where they show before and after instruction schedules.
I don't think anyone would suggest that improving compilers isn't a worthwhile endeavor, and had we discussed this two weeks after the launch of nV30 I'd have agreed there's a lot of room for improvement.
Well, I have been discussing it since the NV3x launch, like a broken a record, I might add, yet after many many shader-benchmark tests people were coming to the conclusion that there was nothing more to be done because of the NV3x architecture. (and of course, with the usual disclaim that a technical discussion on NV3x architecture and optimization opportunities does not constitute a "defense" of NVidia or anti-ATI position for fanboys)
The problem with nVidia's approach to compiler optimization is that it has been entirely lopsided, held out as a panacea that will produce the desired results "given time," and more or less used as a marketing ploy to try and explain away very large performance deficits of nV3x relative to its competition, deficits pertinent to advanced API functionality and much less dependent on traditional factors like bandwidth, TMU's, etc.
I've never see anyone hold out compiler optimization as an explaination for deficits missing from the API. But since optimization did produce good results at last, I'd say the folks championing it (Nvidia) were correct.
Don't try to turn this into an ATI vs NVidia thread, it's not. Even if NV3x shaders ran faster than ATIs, it doesn't make up for the AA, gamma, MRT, and other missing, or inadequate features.
I'm not interested in the Det52's from the aspect of "well, I might want to buy a 5700 now". I am interested in them from the aspect of "What went wrong?"
It appears NVidia's problems are the result of trying to design a shader pipeline to be too flexible (resource sharing, extremely long lengths, predicates, ddx/ddy, unlimited dependent textures, complex instruction set, pure stencil-fill mode, two TMUs, etc) They spent transistors on complexity, which in itself, led to poorer performance, but in doing so, the added complexity made it much harder for the drivers to translate DX9 instructions efficiently. Both these factors led to really crummy performance, now it appears the latter issue has been resolved, but we're still left with HW that is not up to snuff.
To illustrate the point consider how little ATi has talked about compiler optimization in the last year, and yet it's certain they're in no less need of optimized compilers than is nVidia or anybody else.
Not really. ATI's architecture seems much more straight forward and tailor made for DX9 input. You don't have register limitations to deal with. You don't have multi-precision. You have clear rules for how to use the separate vector and scalar units. They still have to do translation and scheduling, but the issues aren't as complex. If you listen to Richard Huddy explain how to hand craft shaders, you'll see that it's much simpler. Hand crafting for NV3x is more difficult.
This all seems very lopsided to me and it certainly appears as if you might be expecting compilers to work miracles. I think "decent improvement" is a reasonable expectation, but I also think that each successive attempt at squeezing performance out of optimization will be a matter of greatly diminishing returns.
That depends on what the performance bottlenecks are and what the driver is and is not doing currently. According to the Unified Compiler paper, they weren't doing much "pairing up" of instructions at all in previous Det's, which leaves functional units sitting idle. Moreover, they weren't re-allocating registers to balance them against other bottlenecks, which also leads to very bad results.
Until David Kirk explains the NV3x architecture in full to us, we have no clue what other hidden bottlenecks are apparent. Expert miracles? No. Possible hefty improvements (10-20%?) It's possible.
I'm looking forward to the end of the nV3x story, myself...

My sincerest prayer is that it is not merely continued with nV4x.
Well, if they can make their design work in conjuction with compilers on the NV4x, more power to them. Their overall approach to the NV3x shader pipeline is not neccessarily wrong, since PS3.0 demands more flexible pipelines anyway. They need to fix the issues they currently have (allow more simultaneous live registers, add more full FP32 units, etc)
The difference in this case is that nVidia's using the concept of compilers (old as the hills and twice as dusty) as a PR tool to try and frame the issue of its performance deficit in such a way as to have it appear less critical than it actually is. I'll bet you that if it was ATi behind, instead of nVidia, that you'd have heard scarcely a peep about compilers out of nVidia all year long.
People on this BBS were predicting issues with compilers with regard to NV3x way before NVidia started talking about it. I don't even recall a year ago NVidia even mentioning that compilers will fix their PS2.0 problems. Almost all NVidia statements were saying "use ps1.4 instead, use lower precisions, use CG,etc" They weren't saying "just you wait, we are going to deliver an optimizing compiler that will give a huge boost to PS2.0"
You are using the label PR as if it means "untrue" or "phony". There are plenty of bad bad nvidia PR statements, but this time, with respect to compilers, they are absolutely right.
The crux of the matter is that it is not for lack of a good compiler that nV3x suffers in comparison to R3x0, but to many more things which are more important and fundamental than the compiler but which nVidia can do nothing about at the present time. The compiler is what they can change presently and so that is what they talk about.
Well, the crux of the matter is, they were suffering for 2 reasons, and one half of the suffering has been eliminated. Software was a huge issue for the Nv3x because of it's architecture and it is likely to be a large issue for all PS3.0 cards.
As far as I know, that has always been the case with 3d chip development. Nothing new to see here in that general regard.
It's far worse now due to the flexibility of modern cards. Not only do they have to make the fixed functions run fast, but they now have shaders which can utilize card resources in any order, rather than the fixed order that the old state based pipeline implied.
(sheesh, my posts are getting like demalion now. Help, it's infectious!)