Cheating and its implications

Humus · May 15, 2003

We now have seen cheats from NVidia, ATi and Trident. The question it raises is, what can we do about it? Your thoughts?

Right now I feel I'm willing to spend some time writing a cheat-proof benchmarking app, a GL_EXT_reme 2.0 perhaps, the original is getting old. Suggestions to what tests you want to include are welcome.

antlers · May 15, 2003

It seems like it would be pretty easy to defeat the kind of cheats NVidia is doing in 3dmark--

Just have the camera take a randomly perturbed path through the benchmark.

Run the benchmark 10 or so times so the random variations get averaged out.

Takes a little longer, but cheating will be obvious.

(Make sure your random number generator has a reliable source of entropy; NVidia's driver folks are obviously trying really hard here)

KimB · May 15, 2003

If you plan on anything more than simple polycount and fillrate tests, it may be even more interesting to randomly vary more significant portions of the benchmark. Some possibilities:

1. Randomly-generated shaders. Have portions of shaders made that can the program can mix and match on the fly.

2. Randomly-generated polycounts. This could be used to test various polycount/fillrate ratios in performance. Easiest way would be to use some sort of software HOS (with optimized triangle-based models generated at the beginning of each run).

Of course, all this should be rooted in reality. The best way to do such things would be to investigate actual games, and attempt to have the weighting on the random generators such that the results would result in similar scenarios to a number of different games.

This may be just too much for one person to do in his/her spare time, but, hey, I like to dream

RussSchultz · May 15, 2003

Adding randomness into a benchmark defeats repeatability. What a joy it would be to see people arguing that their card lost because it drew the 'hard' straws.

Clashman · May 15, 2003

Could you perhaps also change insert randomization that wouldn't need to be averaged out over the course of several runs? For instance, if the benchmark counted the number of polys, pixels, etc, rendered, and gave a score based on "objects" rendered divided by Time taken to render, (as opposed to simply FPS)? That way repeatability wouldn't exactly be necessary, because you'd be measuring total work over time, (which is what Frames over Time are supposed to do anyways). I'm not sure how you'd weight the various factors that would have to be taken into account, however.

antlers · May 15, 2003

RussSchultz said:
Adding randomness into a benchmark defeats repeatability. What a joy it would be to see people arguing that their card lost because it drew the 'hard' straws.

If the variations are truly random repeatability consists simply in running the test enough times. People have been handling random error in measurements for some time now

May make the benchmark last a little longer and be more boring to watch.

Ostsol · May 15, 2003

Make it so that everything in the scene is always in view and orbit the camera around it. This eliminates the possibility of making a 3dMark-like "game test" and probably limits you to just making a synthetic test, but it certainly does make it a little more difficult to cheat. If everything that's going to be rendered in a particular test is always in view, none of the recently revealed optimizations in 3dMark03 could be used. Basically, the benchmark would be like ShaderMark, but could test additional things like vertex shaders, high polygon counts, and other things. Also, use procedural textures where any lack of precision is visible.

cellarboy · May 15, 2003

Humus said:
We now have seen cheats from NVidia, ATi and Trident. The question it raises is, what can we do about it? Your thoughts?

I thought the definition of a cheat would be something works incorrectly to enhance a score, that when fixed has a detrimental effect on that score.

I was under the impression that the whole quake/quack thing was a bug, and when it was fixed had no effect on the scores whatsoever.

Clashman · May 15, 2003

Clashman said:
Could you perhaps also change insert randomization that wouldn't need to be averaged out over the course of several runs? For instance, if the benchmark counted the number of polys, pixels, etc, rendered, and gave a score based on "objects" rendered divided by Time taken to render, (as opposed to simply FPS)? That way repeatability wouldn't exactly be necessary, because you'd be measuring total work over time, (which is what Frames over Time are supposed to do anyways). I'm not sure how you'd weight the various factors that would have to be taken into account, however.

In a moment of anticipatory self-criticism:

Question: But then how would you rate the work being done by traditional renderers against that done by deferred renderrers, which do less work for comparable visual results?

Answer: I don't have a bloody clue.

gokickrocks · May 15, 2003

RussSchultz said:
Adding randomness into a benchmark defeats repeatability. What a joy it would be to see people arguing that their card lost because it drew the 'hard' straws.

unless you add an option that allows you to save all the variables and lets you load the values for comparisons

Himself · May 15, 2003

Do you need a benchmark for site to site comparisons? All you need is a benchmark that you can configure or set up like it's a brand new benchmark each time you do a shootout. If there were N camera paths for each test in 3dmark03 then NVIDIA and the like would have to do too much work to cheat around them all. At least using this one method. Once it's set up you can run it with the same data for each card, do up the graphs and make up some foolishness to say and you're done.

Clashman · May 15, 2003

Himself said:
Do you need a benchmark for site to site comparisons? All you need is a benchmark that you can configure or set up like it's a brand new benchmark each time you do a shootout. If there were N camera paths for each test in 3dmark03 then NVIDIA and the like would have to do too much work to cheat around them all. At least using this one method. Once it's set up you can run it with the same data for each card, do up the graphs and make up some foolishness to say and you're done.

It's just crazy enough that it might actually work!

You wouldn't even need to have multiple paths which could be taken. What might work even better would be to have a random path generator. Once the path to be taken in the benchmark was calculated, it could be saved and loaded up again when the new card was inserted. Thus you can have these random benchmarks that can't be calculated for ahead of time by hardware vendors, but that are perfectly repeatable.

Humus · May 15, 2003

Chalnoth said:
If you plan on anything more than simple polycount and fillrate tests, it may be even more interesting to randomly vary more significant portions of the benchmark. Some possibilities:

1. Randomly-generated shaders. Have portions of shaders made that can the program can mix and match on the fly.

2. Randomly-generated polycounts. This could be used to test various polycount/fillrate ratios in performance. Easiest way would be to use some sort of software HOS (with optimized triangle-based models generated at the beginning of each run).

Of course, all this should be rooted in reality. The best way to do such things would be to investigate actual games, and attempt to have the weighting on the random generators such that the results would result in similar scenarios to a number of different games.

This may be just too much for one person to do in his/her spare time, but, hey, I like to dream

I was thinking something along (1). (2) is a good idea too. I'm not sure how much time I'll spend on it though, but I'm definitely doing something at least. Have created the basic workspace for the app so far.

Humus · May 15, 2003

RussSchultz said:
Adding randomness into a benchmark defeats repeatability. What a joy it would be to see people arguing that their card lost because it drew the 'hard' straws.

You could just feed the random generator with a seed. The seed could be choosen arbitrarily by the reviewer. It will be perfectly repeatable, though impossible for a driver to predict.

Clashman · May 15, 2003

gokickrocks said:
RussSchultz said:

Adding randomness into a benchmark defeats repeatability. What a joy it would be to see people arguing that their card lost because it drew the 'hard' straws.

Click to expand...

unless you add an option that allows you to save all the variables and lets you load the values for comparisons

What he said, (assuming male gender here, apologies if I'm wrong).

madshi · May 15, 2003

Humus, I've not much knowledge about shader programming. Is it possible for your program to get the exact result of a specific shader run, so that you can check whether the result is "correct"? If this was possible you could probably check for both cheating and accuracy (e.g. 16bit vs. 32bit).

Humus · May 15, 2003

cellarboy said:
I was under the impression that the whole quake/quack thing was a bug, and when it was fixed had no effect on the scores whatsoever.

Well, some would like to think so, but I don't think so. It was a move to hide the fact that the OpenGL driver wasn't really ready for launch. Things like HyperZ wasn't yet supported. Once they fixed those things that compensated for the loss of getting texture quality back to normal.

Humus · May 15, 2003

madshi said:
Humus, I've not much knowledge about shader programming. Is it possible for your program to get the exact result of a specific shader run, so that you can check whether the result is "correct"? If this was possible you could probably check for both cheating and accuracy (e.g. 16bit vs. 32bit).

That would require more work, but it could be possible by implementing it in software. Not sure I will want to go that way, but it should be good enough to just compare the images between two cards.

Gubbi · May 15, 2003

Reviews should just use games that allows you to record your own demo and then play it back. These demos should be made publically available for verification (and so different sites can compare results).

Every 3-4 weeks these demos should be thrown out and new ones recorded.

Cheers
Gubbi

demalion · May 15, 2003

What do you think of these suggestions I offered before? I can expand upon them if you think it would be helpful.

Cheating and its implications

Humus

Crazy coder

antlers

KimB

RussSchultz

Professional Malcontent

Clashman

antlers

Ostsol

cellarboy

Clashman

gokickrocks

Himself

Clashman

Humus

Crazy coder

Humus

Crazy coder

Clashman

madshi

Humus

Crazy coder

Humus

Crazy coder

Gubbi

demalion

Similar threads