3dilettante said:
...
The way FutureMark has positioned 3dMark 2k3 is as a prediction of upcoming games. In that sense, it is rapidly approaching the point where the games it is predicting will come to pass, after which the game tests will be as redundant as the weatherman going outside, holding out his hand and telling you if it is raining.
Well, it's not just serving for predicting, it also serves as a reference point (for each of the tests in it, and what they measure) that works to isolate the GPU hardware and DirectX API featureset performance of it. Prediction was/is merely a temporary function of the lack of games utilizing the features, the measurement reference and isolation are permanent, and still useful.
Working as a reference point of GPU isolating metrics still remains useful for gaining more information, including additional meaning from other benchmark results, game and synthetic.
In addition, it isn't guaranteed that those predictions will turn out to be true, or that it is correct in the relative significance of the factors it weighs more heavily.
As far as the factors and the API in question, and actually doing the work requested, yes it does, when talking about the GPU and API conformant driver behavior involved. This has already been verified and corroborated repeatedly.
...
Again, FutureMark has positioned and developed the benchmark as a prediction, which rapidly loses its freshness when its projected timeframe comes around.
When it was launched it did try to serve that, because there was, effectively, nothing else. This doesn't mean that this is actually the only function it serves, this just means that this is their marketing angle.
When these games actually do come about, they are far better at getting information on how they perform than a benchmark that tries to guess the outcome of the entire spectrum of graphics intensive games.
Of course games are better at representing themselves, but they are not better at representing games in general. Unless you isolate the information and GPU stresses as specifically as a synthetic does...which just creates another data point, which will correspond less to the way another game runs except as the GPU stresses...unless...etc. A synthetic is just a headstart of one datapoint, directed with forethought to isolate and provide such information. 3dmark still succeeds at this, and, as other synthetics that succeed, it still does even with other data points.
The utility of a synthetic is tied to how well it can test certain variables and separate them in order to allow for analysis.
Yes. Prediction is a side effect of the variables not being widely used elsewhere.
This is something that the game tests are simply to coarse-grained to resolve.
Ah, perhaps, but you haven't addressed why they are too coarse-grained yet, just why their time of being used for prediction is passing, which is something else entirely.
AFAICS, actual shader performance characteristics for hardware and API conformance in drivers has been accurately represented, and the information was not "too coarse" to have made such measurements useful, and to continue to be a point of reference even if a unique role of prediction is falling by the wayside.
The affecting issue here is dealing with the lack of API conformance, but that is an issue that affects the other information sources about hardware performance for API features as well. The way to address is through more datapoints, and ones where the non-conformance is being extensively counter-acted are rare, and, it seems to me, have much less business being discarded when trying to form an accurate representation.
It could also be argued that the other tests in 3dmark that could provide greater probitive value are too few to be exhaustive enough, and since there are still optimizations there, they are now suspect.
Indeed, I still agree that the PS 2.0 test is a significant problem, from the perspective of how useful they could have made that patch. Even information on the cheat could help here, such as whether it is a complete deception, or a deception of applicability but not performance of the workload (i.e., valid optimization introduced invalidly), etc.
...
I wasn't clear in my deliniating the two different parts of my thinking. First, is that as a predictor using its game tests, 3dMark's target is becoming too close to the present to prevent its results from being redundant.
Well, redundant to what? One game doesn't predict other games, let alone try to. A game is a better indicator of its own performance by virtue of identity. This doesn't replace synthetics for evaluating anything but that game's fps, which still leaves other tasks, like what role the
hardware plays.
Meanwhile, the few more specific synthetics 3dMark has have not been patched as thoroughly as the game tests. The fact that futuremark released the patch anyway because those other tests didn't contribute to the score indicates that they also do not emphasize those other tests--and whatever value they have in gathering anything specific about the broad, rough prediction allowed by the game tests.
Well, as long as you don't take that to a contradicted absolute like "they don't value it"...coding it in the first place doesn't seem to make that statement valid, and they didn't put the cheats in that compromise its usefulness themselves.
What it does indicate is that they didn't emphasize 1) removing cheats in it, 2) yet again, 3) for one particular IHV, 4) as much as removing cheats in the final score contributing game tests. In that perspective, it is a policy mistake in responding to cheats of an IHV that leaves the usefulness of a part of the suite impaired, not something that changes the inherent usefulness of the suite and what it tries to measure.
Games still start out behind here, for evaluating hardware, because they don't generally even
have a policy of this nature to preserve.
...
I agree it would be unreasonable to place blame on FutureMark for the potential weakness of reviewer methodology. The problem is, using 3dMark as a valid data point is becoming an increasingly questionable use of time, since it is equally unreasonable to expect anything useful to come from having to back-revision drivers just to run a redundant benchmark with very little "meat" to it.
The extra work is necessary for evaluating hardware, but it is being created by an IHV. Futuremark actually lessens it quite signifcantly, if you consider what "back-revision"ing the drivers is actually doing for you.
It isn't just reviewer methodology, it is a matter of what you label a "questionable use of time". What is a questionable use of time is actually accurately evaluating nVidia hardware, because of what nVidia has done...if you actually decide to
do so, doing less work then necessary is just failing in the task. The distinction for reviewer methodology is that an honest and competent reviewer should be informed enough to know better and have accurate representation as their goal.
If it gave anything unique in that respect, perhaps it would be worthwhile to hold onto the old drivers a little longer, but it doesn't really strike me that it justifies things enough for any real review to do so.
Are these "real" hardware reviews? AFAICS, you haven't actually illustrated why it is any less useful for evaluating
hardware.
Given the ease by which Nvidia circumvented the safeguards, there is little FutureMark can do for this current program, let's hope the next one is better.
Hmm...well, they could have done better, but they've done enough to retain some usefulness. They really should have addressed the PS 2.0 test issue, but there were a lot of things they restored the usefulness to as well as that particular failure, more if they incorporate the lessons learned as you mention.