demalion said:
Hmm...this is a demonstrated problem with anything recognized as a benchmark. It is a fault, but it seems a confusion to mention only that it as a fault of the the application (which is the exact phrasing you used), since the fault is universal among applications used in that fashion. The only specific entity that it seems reasonable to associate the fault with is the party performing the act of cheating.
Since it's you demalion, I'll give you a long response.
Well, while the mechanism could be used on any benchmark, it is arguably more of a problem for 3DMark for two reasons.
First, less important, 3DMark is synthetic - there are no incidental performance benefits for any actual users per se.
Second, and most important, 3DMark has such a high profile and such longevity as a benchmarking standard, that an effort to squeeze as high scores out of it as possible is sure to pay off in the market. Both in terms of product/brand recognition and in revenue.
Neither of the above can be said to be a fault of Futuremark - it is hardly wrong of them to produce a benchmark with outstanding recognition and product revue penetration. On the other hand, this fact also makes 3DMark particularly profitable to, let us say, "spend effort optimizing for".
Any such application specific optimization will reduce the predictive value of the results. Application specific code only makes the problem worse.
So it isn't strange that FutureMark has dealt with these things a bit furtively. If they point the finger at a sinner they are also calling attention to a huge problem with the predictive value of their benchmark. They're in a bit of a bind, quite simply.
(You may remember the hubbub that resulted when it was discovered that disabling the splash screen between tests caused nVidia scores to drop? FutureMark already knew about this and even had a note about it on their site where they just said that it had nothing to do with their code and that it was due to nVidias drivers. But they called no attention to it, didn't say what was going on, just made sure they washed their hands of it. Nor have I ever seen a word about the horrid image quality "optimizations" of the Xabres.)
I'm arguably more interested in benchmarking than 3D - I've been taking an active interest since the formation of SPEC. Before that, actually. And if there is one thing you can be sure of, it is that IHVs will do everything they can to make their product look good by whatever yardstick is used by their market. The approach taken by nVidia vis a vis 3DMark is exactly the same that for instance SUN took vs SPEC and later TPC. If you can't compete, discredit and direct the attention towards something else. And sometimes this is actually valid - benchmarks do loose usefulness, and the industry should move on to something that better reflects where the focus lies.
One of the valid criticisms levied against SPEC is that it is as much of a compiler benchmark as a hardware benchmark. The flip side of that coin is that there is probably nothing that has been as important for pushing the quality of compiler generated code forward as the existance of that well defined and widely acknowledged base of target codes. I've always seen the same as potentially true for 3DMark - that the existance of a widely used benchmark application could help push driver performance generally. Of course, this would still make it less useful as a predictor, but the incidental benefit would far outweigh that the masses are conned.
Note however that application specific code completely invalidates this potentially beneficial aspect of 3DMark.
Entropy
PS: Carmacks engines have been targets in the same way on the OpenGL side. I haven't heard of anyone being able to bypass his routines and replace them with equivalent but faster though. Avoiding inefficiencies or worse, redundancies is important in benchmarks. For instance once compilers got smarter they simply optimised away entire inner loops of some older widely used benchmarks because they figured out that the results weren't used globally.