Legit Reviews cops - out on 3DMARK

Neeyik · Jan 26, 2004

Mariner said:
Then why doesn't FM just put out a patch for 3dm2k3 that allows PP_ hints? Wouldn't that just make nVidia cream their pants and do backflips?

Click to expand...

Changing all the shaders to add PP would make comparisons between old and new scores invalid which would bugger things up somewhat, I'm sure.

It could be added as a non-default setting though, just as one can force PS1.1 in the GT2 and GT3 tests in the current revision of 3DMark03.

Mariner · Jan 26, 2004

Yep - I suppose that is a possibility.

Personally, I reckon that a new 2004 edition of 3DMark must certainly be released as soon as the new chips supporting 3.0 shaders become available.

As people are noting, PS 2.0 games with benchmarking are beginning to be released which is making 3DMark2003 seem less important to reviewers at least.

There aren't going to be any PS 3.0 games out when NV40/R420 are released, however, so I'd expect Futuremark will want to quickly fill the benchmarking vacuum for the new shader models.

In fact, I'd imagine NV is closely working with Futuremark trying to persuade them which shaders can acceptably use FP16 and which can't!

All just my guess, of course! 8)

Doomtrooper · Jan 26, 2004

Deanoc stated about 7 months ago that Nvidia was negotiating with M$ about the inclusion for _pp, looks like as usual he was right on the money.
_pp to me is a step backwards...a inclusion to make Nvidia hardware more competetive in DX, and from what I see doesn't give the FX cards a huge boost anyways, as always a FX card will lose to a R300 class VPU as it has mored pixel shader pipelines.

Next generation hardware will make this argument moot, at least for ATI.

Hanners · Jan 26, 2004

A while back I would have argued against the use of PP in future revisions of 3DMark, but now it's an officially (and easily) supported part of the API, and the lay of the gaming land has become clear to the point where we can see that developers have adopted it, then I can't see a problem with its usage provided it was done properly and reasonably (i.e. to the extent a developer would use it, where it is possible without sacrificing image quality in any way).

Slightly off the topic but still on it in a sense, was it ever confirmed either way whether S3 DeltaChrome can do FP16? If it can, than that could also be persuasive into arguing its inclusion in the next version of 3DMark.

3dilettante · Jan 26, 2004

demalion said:
I don't think so, as it was only ever useful as one targetted and informative point of data anyways. It still serves in that regard (with the right IHV/driver combination to remove cheating as a factor in certain tests).

The way FutureMark has positioned 3dMark 2k3 is as a prediction of upcoming games. In that sense, it is rapidly approaching the point where the games it is predicting will come to pass, after which the game tests will be as redundant as the weatherman going outside, holding out his hand and telling you if it is raining. In addition, it isn't guaranteed that those predictions will turn out to be true, or that it is correct in the relative significance of the factors it weighs more heavily.

How do you figure? They became the least unique in what they utilized in comparison to games in the shortest amount of time, because they were aimed at representing games. This is because Futuremark was accurate in game utilization of the factors as they set out to be.
As for the other tests, there were vertex processing and pixel shading 2.0, etc., synthetics out at almost the same time, or before. I don't think uniqueness in utilization was an intended value.

Again, FutureMark has positioned and developed the benchmark as a prediction, which rapidly loses its freshness when its projected timeframe comes around. When these games actually do come about, they are far better at getting information on how they perform than a benchmark that tries to guess the outcome of the entire spectrum of graphics intensive games. The utility of a synthetic is tied to how well it can test certain variables and separate them in order to allow for analysis. This is something that the game tests are simply to coarse-grained to resolve.

It could also be argued that the other tests in 3dmark that could provide greater probitive value are too few to be exhaustive enough, and since there are still optimizations there, they are now suspect.

This is the heart of my disagreement here...this statement seems completely wrong. It is being intentionally GPU bound that sets 3dmark apart as a synthetic, on purpose, and makes it less imperfect as a benchmark than (almost all) games. How do you come up with less GPU bound being a better metric for GPU performance?
All it is is more indicative of that actual games performance than...that actual game, which isn't a surprise, as you are measuring that actual game. Some other game certainly isn't any better, until you dig through and somehow isolate the factors for comparison that a synthetic isolates for you. That's why 3dmark 03 was already important as one point, but not as the only point.

What you seem to be attacking is the lazy reviewer's usage who only used 3dmark and thought it was all that was needed. That's a good thing to attack, but all too often discussions of this matter persist in proposing that this usage defines 3dmark, when it clearly doesn't, and Futuremark has said as much.

I wasn't clear in my deliniating the two different parts of my thinking. First, is that as a predictor using its game tests, 3dMark's target is becoming too close to the present to prevent its results from being redundant.

Meanwhile, the few more specific synthetics 3dMark has have not been patched as thoroughly as the game tests. The fact that futuremark released the patch anyway because those other tests didn't contribute to the score indicates that they also do not emphasize those other tests--and whatever value they have in gathering anything specific about the broad, rough prediction allowed by the game tests.

For instance, this seems to only make sense with the criteria of 3dmark being used alone, and/or ignoring how well it corresponds to measuring the hardware performance.

I agree it would be unreasonable to place blame on FutureMark for the potential weakness of reviewer methodology. The problem is, using 3dMark as a valid data point is becoming an increasingly questionable use of time, since it is equally unreasonable to expect anything useful to come from having to back-revision drivers just to run a redundant benchmark with very little "meat" to it. If it gave anything unique in that respect, perhaps it would be worthwhile to hold onto the old drivers a little longer, but it doesn't really strike me that it justifies things enough for any real review to do so.

Given the ease by which Nvidia circumvented the safeguards, there is little FutureMark can do for this current program, let's hope the next one is better.

demalion · Jan 27, 2004

3dilettante said:
...
The way FutureMark has positioned 3dMark 2k3 is as a prediction of upcoming games. In that sense, it is rapidly approaching the point where the games it is predicting will come to pass, after which the game tests will be as redundant as the weatherman going outside, holding out his hand and telling you if it is raining.

Well, it's not just serving for predicting, it also serves as a reference point (for each of the tests in it, and what they measure) that works to isolate the GPU hardware and DirectX API featureset performance of it. Prediction was/is merely a temporary function of the lack of games utilizing the features, the measurement reference and isolation are permanent, and still useful.

Working as a reference point of GPU isolating metrics still remains useful for gaining more information, including additional meaning from other benchmark results, game and synthetic.

In addition, it isn't guaranteed that those predictions will turn out to be true, or that it is correct in the relative significance of the factors it weighs more heavily.

As far as the factors and the API in question, and actually doing the work requested, yes it does, when talking about the GPU and API conformant driver behavior involved. This has already been verified and corroborated repeatedly.

...
Again, FutureMark has positioned and developed the benchmark as a prediction, which rapidly loses its freshness when its projected timeframe comes around.

When it was launched it did try to serve that, because there was, effectively, nothing else. This doesn't mean that this is actually the only function it serves, this just means that this is their marketing angle.

When these games actually do come about, they are far better at getting information on how they perform than a benchmark that tries to guess the outcome of the entire spectrum of graphics intensive games.

Of course games are better at representing themselves, but they are not better at representing games in general. Unless you isolate the information and GPU stresses as specifically as a synthetic does...which just creates another data point, which will correspond less to the way another game runs except as the GPU stresses...unless...etc. A synthetic is just a headstart of one datapoint, directed with forethought to isolate and provide such information. 3dmark still succeeds at this, and, as other synthetics that succeed, it still does even with other data points.

The utility of a synthetic is tied to how well it can test certain variables and separate them in order to allow for analysis.

Yes. Prediction is a side effect of the variables not being widely used elsewhere.

This is something that the game tests are simply to coarse-grained to resolve.

Ah, perhaps, but you haven't addressed why they are too coarse-grained yet, just why their time of being used for prediction is passing, which is something else entirely.

AFAICS, actual shader performance characteristics for hardware and API conformance in drivers has been accurately represented, and the information was not "too coarse" to have made such measurements useful, and to continue to be a point of reference even if a unique role of prediction is falling by the wayside.

The affecting issue here is dealing with the lack of API conformance, but that is an issue that affects the other information sources about hardware performance for API features as well. The way to address is through more datapoints, and ones where the non-conformance is being extensively counter-acted are rare, and, it seems to me, have much less business being discarded when trying to form an accurate representation.

It could also be argued that the other tests in 3dmark that could provide greater probitive value are too few to be exhaustive enough, and since there are still optimizations there, they are now suspect.

Indeed, I still agree that the PS 2.0 test is a significant problem, from the perspective of how useful they could have made that patch. Even information on the cheat could help here, such as whether it is a complete deception, or a deception of applicability but not performance of the workload (i.e., valid optimization introduced invalidly), etc.

...

I wasn't clear in my deliniating the two different parts of my thinking. First, is that as a predictor using its game tests, 3dMark's target is becoming too close to the present to prevent its results from being redundant.

Well, redundant to what? One game doesn't predict other games, let alone try to. A game is a better indicator of its own performance by virtue of identity. This doesn't replace synthetics for evaluating anything but that game's fps, which still leaves other tasks, like what role the hardware plays.

Meanwhile, the few more specific synthetics 3dMark has have not been patched as thoroughly as the game tests. The fact that futuremark released the patch anyway because those other tests didn't contribute to the score indicates that they also do not emphasize those other tests--and whatever value they have in gathering anything specific about the broad, rough prediction allowed by the game tests.

Well, as long as you don't take that to a contradicted absolute like "they don't value it"...coding it in the first place doesn't seem to make that statement valid, and they didn't put the cheats in that compromise its usefulness themselves.

What it does indicate is that they didn't emphasize 1) removing cheats in it, 2) yet again, 3) for one particular IHV, 4) as much as removing cheats in the final score contributing game tests. In that perspective, it is a policy mistake in responding to cheats of an IHV that leaves the usefulness of a part of the suite impaired, not something that changes the inherent usefulness of the suite and what it tries to measure.

Games still start out behind here, for evaluating hardware, because they don't generally even have a policy of this nature to preserve.

...
I agree it would be unreasonable to place blame on FutureMark for the potential weakness of reviewer methodology. The problem is, using 3dMark as a valid data point is becoming an increasingly questionable use of time, since it is equally unreasonable to expect anything useful to come from having to back-revision drivers just to run a redundant benchmark with very little "meat" to it.

The extra work is necessary for evaluating hardware, but it is being created by an IHV. Futuremark actually lessens it quite signifcantly, if you consider what "back-revision"ing the drivers is actually doing for you.

It isn't just reviewer methodology, it is a matter of what you label a "questionable use of time". What is a questionable use of time is actually accurately evaluating nVidia hardware, because of what nVidia has done...if you actually decide to do so, doing less work then necessary is just failing in the task. The distinction for reviewer methodology is that an honest and competent reviewer should be informed enough to know better and have accurate representation as their goal.

If it gave anything unique in that respect, perhaps it would be worthwhile to hold onto the old drivers a little longer, but it doesn't really strike me that it justifies things enough for any real review to do so.

Are these "real" hardware reviews? AFAICS, you haven't actually illustrated why it is any less useful for evaluating hardware.

Given the ease by which Nvidia circumvented the safeguards, there is little FutureMark can do for this current program, let's hope the next one is better.

Hmm...well, they could have done better, but they've done enough to retain some usefulness. They really should have addressed the PS 2.0 test issue, but there were a lot of things they restored the usefulness to as well as that particular failure, more if they incorporate the lessons learned as you mention.

see colon · Jan 27, 2004

all this discussion about 3dmark03 being useless because nVidia is "cheating" is funny. they cheated in past versions, got caught, and no one seamed to mind. now it's a huge deal, and i think it's costing nVidia and futuremark alot of money.
c:

bloodbob · Jan 27, 2004

Richthofen said:
who cares about Futuremark.
We don't need that crapmark anyway.

I wanna have games benchmarked and not synthetic crap like 3dmark, PC mark, Sysmark or Sisoft Sandra....

Its better when they cheated in Benchmarks but since then they have started cheating in games and you want graphics companies to give you sub standard quailty why not just turn all the textures down don't use AA don't use AF ect for better speeds since you don't mind the quality loss.

cthellis42 · Jan 27, 2004

Brent said:
comparison between old and new scores are already invalid because of new patches that change performance and new drivers that implement optimizations that change performance

Brent, do you even understand the drum you attempt to beat? Is every performance test annulled the moment a developer or an IHV comes out with a way to increase performance with a patch or driver update? Please make sure to go through all your old reviews and mark all the relevant game tests with "nevermind these--there's been an update." (Or hey, how about between switching testing methods--like, say, moving from canned runthroughs or botmatches to FRAPSing? My, that must be even WORSE!)

Performance results are a snapshot in time, and still useful for noting the comparison between cards, and a card's progression over time. In this case, in fact, since FutureMark does not patch simply to improve their performance, it's a straight look at IHV's driver interaction. Knowing what numbers CAN be used can get tricky the more confusion is out there, but--silly me--I don't like to validate underhanded tactics (which somehow everyone seems to AGREE are bad, and yet...?) by letting them bring about their desired objective.

Meanwhile, introducing _pp to the game tests at this point adds a whole new factor that's never been there before. It's not an IHV cleaning up their overall shader performance--it's totally different mathmatics. (As are the "optimizations" they attempt to fight.) I could see it perhaps being added in as a feature test, but at this point it would seem odd to add to the package; I certainly see it being included in the next version, however--whatever comes down the pipe. NOW, at least, it's fully a part of DX spec and being actively used, so it's among the API facets it should check.

see colon · Jan 27, 2004

Its better when they cheated in Benchmarks but since then they have started cheating in games and you want graphics companies to give you sub standard quailty why not just turn all the textures down don't use AA don't use AF ect for better speeds since you don't mind the quality loss.

and again i ask the question, how is this differant. nVidia has had the reputation for years of decreasing image quality in commonly benchmarked titles for thel purpose of speed. amazing how previous ati cards were released alongside new detonators that shamed ati's performance, then the next relese fixes uncanny "bugs" that just happened to show up in the uber:drivers. nVidia is playing the same game they always have, and i don't see why it's such an issue now
c:

Unit01 · Jan 27, 2004

LEGIT REVIEWS said:
We always want to give significant and accurate results to our readers. We have been continuing to include 3DMark03 scores because we thought it was a benchmark that many still thought was significant despite nVidia's optimizations. We now realize that due to nVidia's actions, 3DMark03 scores are insignificant and inaccurate. We can not change this. Only FutureMark and/or nVidia can bring this benchmark back to life.

We also always use the latest drivers available from nVidia's website for our tests. So far, FutureMark's solution is to only allow users to publish 3DMark03 scores using approved drivers. We do not think is a good idea because a newer version of a graphics driver could potentially fix a bug that is not 3DMark03 dependent, but would still slightly affect its score. Now that is not currently the case, but it could be. So in order to maintain a Standard Operating Procedure, we feel it is best to NOT include 3DMark03 in our graphics solution reviews any longer.

So as of today, January 26, 2004, we will no longer include 3DMark03 in our graphics solution reviews in order to use the most recent graphics driver in all of our tests.

Shortly said.
Since nvidia has cheated in the benchmark legit reviews find the BENCHMARK 3dmark03 worthless. And since legit reviews find the burden of swapping drivers to difficult amidst the hamburger eating benchmarking session legit reviews has now decided to discard 3dmark03 from the reviews. *whisper mode - Legit reviews finds nvidia's cheating in 3dmark03 good /*whisper mode

Shit talk about lazy idiots. And biased as hell. They're approaching this the wrong way. Since nv cheated in the benchmark they punish FM. Yeah that sure is smart

Quitch · Jan 27, 2004

see colon said:
Its better when they cheated in Benchmarks but since then they have started cheating in games and you want graphics companies to give you sub standard quailty why not just turn all the textures down don't use AA don't use AF ect for better speeds since you don't mind the quality loss.

Click to expand...

and again i ask the question, how is this differant. nVidia has had the reputation for years of decreasing image quality in commonly benchmarked titles for thel purpose of speed. amazing how previous ati cards were released alongside new detonators that shamed ati's performance, then the next relese fixes uncanny "bugs" that just happened to show up in the uber:drivers. nVidia is playing the same game they always have, and i don't see why it's such an issue now
c:

I suspect because it has since been shown by ATI that you don't *have* to sacrifice IQ to get decent FPS. It's a problem because what nVidia are doing simply isn't necessary, as ATI have shown, it's just them covering up for crappy hardware.

see colon · Jan 27, 2004

I suspect because it has since been shown by ATI that you don't *have* to sacrifice IQ to get decent FPS.

i'm not overly happy with ati's image quality either, especialy in opengl. ati has sacrificed image quality for speed. their current hardware/driver combination does not offer 32bit texture support in opengl (at least in enemy territory, pretty much the only opengl game i still play), supersampling, only has a 5 bit texture filter, cannot perform af on some texture angles, has no w buffer, does not render the full translucent spectrum, and does not perform truform in hardware (unlike the 8000 series, that does).
c:

cthellis42 · Jan 27, 2004

see colon said:
and again i ask the question, how is this differant. nVidia has had the reputation for years of decreasing image quality in commonly benchmarked titles for thel purpose of speed.

Not as overt, easier to mask, wasn't really being looked for, and not being a directly offensive stance against a developer.

And on the whole I think it would be a VERY good idea for developers to have a backbone and stand against certain IHV's if they take too many "liberties" with their products; especially in regards to benchmarking. And though everyone seems to agree it's a good stance, they still attack the wrong targets.

see colon · Jan 27, 2004

Not as overt, easier to mask, wasn't really being looked for, and not being a directly offensive stance against a developer.

easier to mask because it wasn't being lokked for maybe. i distinctly remember when the r8500 came out nVidia released new drivers that gave a healthy increase in 3dmark2001 to the geforce3, and more than one website noted that objects (smoke clouds, ect) we missing in some tests, but no big mess was made over it. if selective rendering (ie not rendering parts of the scene at all) is not taking "liberties" with a developers software, i don't know what is.

in my opinion things are being "looked for" too much now. i believe ati owners would have better nwn/kotor performance if ati would use app detection to fix issues just with those games, but in the current environment i can see why they shy away from it (fear of being accused of "cheating"). but us, as consumers, are the ones paying the price instead of enjoying the games.
c:

3dilettante · Jan 27, 2004

Unit01 said:
LEGIT REVIEWS said:

We always want to give significant and accurate results to our readers. We have been continuing to include 3DMark03 scores because we thought it was a benchmark that many still thought was significant despite nVidia's optimizations. We now realize that due to nVidia's actions, 3DMark03 scores are insignificant and inaccurate. We can not change this. Only FutureMark and/or nVidia can bring this benchmark back to life.

We also always use the latest drivers available from nVidia's website for our tests. So far, FutureMark's solution is to only allow users to publish 3DMark03 scores using approved drivers. We do not think is a good idea because a newer version of a graphics driver could potentially fix a bug that is not 3DMark03 dependent, but would still slightly affect its score. Now that is not currently the case, but it could be. So in order to maintain a Standard Operating Procedure, we feel it is best to NOT include 3DMark03 in our graphics solution reviews any longer.

So as of today, January 26, 2004, we will no longer include 3DMark03 in our graphics solution reviews in order to use the most recent graphics driver in all of our tests.

Click to expand...

Shortly said.
Since nvidia has cheated in the benchmark legit reviews find the BENCHMARK 3dmark03 worthless. And since legit reviews find the burden of swapping drivers to difficult amidst the hamburger eating benchmarking session legit reviews has now decided to discard 3dmark03 from the reviews. *whisper mode - Legit reviews finds nvidia's cheating in 3dmark03 good /*whisper mode

Shit talk about lazy idiots. And biased as hell. They're approaching this the wrong way. Since nv cheated in the benchmark they punish FM. Yeah that sure is smart

Unfortunately, I think that is the only outcome that could be expected. If one were to compare this to a shooting gallery, 3dmark03 is a sitting duck compared to the way Nvidia sloughs off driver revisions like a snake does it skin.

3dmark is a single static software product, and it has a giant bullseye painted on it. There is very little that can be done to punish Nvidia, heck even futuremark can't, thanks to the fact it is back to taking Nvidia's cash. 3dmark, on the other hand, is hurt even by things not even targeted at it. I think I've seen it argued that it is unfair that Nvidia is blasted for expecting special treatment--what with its demanding the world change their test and usage conditions to suit Nvidia--while FutureMark is now expecting everyone to now do something similar (though to a lesser degree, and more open) to make up for 3dmark's failings.

As lazy as Legit Reviews' actions may seem, FutureMark's action of putting a patch that partially fixed the scores for all of one driver revision doesn't strike me as being particularly athletic. They essentially changed some instruction ordering, which took Nvidia all of ten seconds to re-fingerprint. Nvidia can afford to reverse engineer something it already pays to be a beta member for, while FutureMark can hardly manage to patch the thing once.

What they need to do is replace 3dMark with something other than a sitting duck. I don't know if it is possible to somehow put in some kind of rotation, where fresh tests are available for download every once in a while. In essence, it would be kind of like FutureMark could make its own custom timedemos, though this would also entail somehow divorcing the content creation part of the program from having to give Nvidia a heads up through the beta program.

A more chameleon-like benchmark could create a monthly window of reasonable legitimacy, and probably keep the program fresh for far longer. However, this would require a lot more work to be done on FutureMark's part, and it may well be impractical.

karlotta · Jan 28, 2004

Too funny, and there name is Legit? Will they make custom demos,and use fraps? I bet they just stay lazy and use in game bench marks that have cheats also... more and more pp are steping into the "im a lazyass stupid reviewer line"

this has gotten me realy mad!

Legit Reviews cops - out on 3DMARK

Neeyik

Homo ergaster

Mariner

Doomtrooper

Hanners

3dilettante

demalion

see colon

All Ham & No Potatos

bloodbob

Trollipop

cthellis42

Hoopy Frood

see colon

All Ham & No Potatos

Unit01

Quitch

see colon

All Ham & No Potatos

cthellis42

Hoopy Frood

see colon

All Ham & No Potatos

3dilettante

karlotta

pifft

Similar threads