Cheating and its implications

LeStoffer · May 15, 2003

gokickrocks said:
RussSchultz said:

Adding randomness into a benchmark defeats repeatability. What a joy it would be to see people arguing that their card lost because it drew the 'hard' straws.

Click to expand...

unless you add an option that allows you to save all the variables and lets you load the values for comparisons

Exactly my thoughts. Keep it this simple and publish the specific 'dataset' that the benchmark is done with.

Gubbi said:
Reviews should just use games that allows you to record your own demo and then play it back. These demos should be made publically available for verification (and so different sites can compare results).

Every 3-4 weeks these demos should be thrown out and new ones recorded.

Cheers
Gubbi

Yes, a commendable approach when testing a full game engine. 8)

Evildeus · May 15, 2003

Little OT but
What's the use of Microsoft certification because it doesn't seem to help in anyway?

Tim · May 15, 2003

gokickrocks said:
unless you add an option that allows you to save all the variables and lets you load the values for comparisons

It is just a seed number you have to save, then the reviewer could chose a random seed number and use it on all cards tested.

gokickrocks · May 15, 2003

Tim said:
gokickrocks said:

unless you add an option that allows you to save all the variables and lets you load the values for comparisons

Click to expand...

It is just a seed number you have to save, then the reviewer could chose a random seed number and use it on all cards tested.

even better ^

also to add, instead of the loading part, since we only need to deal with a seed number, just add a prompt that asks for the seed, then all reviewers would have to do is remember the number and others can verify by inputing the same seed

eSa · May 15, 2003

gokickrocks said:
Tim said:

gokickrocks said:

unless you add an option that allows you to save all the variables and lets you load the values for comparisons

Click to expand...

It is just a seed number you have to save, then the reviewer could chose a random seed number and use it on all cards tested.

Click to expand...

even better ^

also to add, instead of the loading part, since we only need to deal with a seed number, just add a prompt that asks for the seed, then all reviewers would have to do is remember the number and others can verify by inputing the same seed

It's nice to see the others are just as clever as myself

So, you have all basically come up just the same ideas.. This is what I thought;

Tester builds random seed using something like mouse wiggling, this seed is saved to the random (or user ) named file. Then Card A and Card B are tested with benchmark using this seed. Seed can be used to build for example Bezier-curve based (or similar!) smooth random flight paths.

When we create enough different random runs based on these flight paths, say 25, cheating really becomes A LOT harder. Min/max/avg FPS ratings from each run can be used to calculate geometric mean or similar measure. Because both cards use same seed and same paths you can easily do meaningful comparison. This scheme can of course applied to variety of effects and shaders...

Also do note, that if we use a lot of random runs, the accurancy of the procentual difference between cards compared gets quite fast under one percent. Only drawback is that benchmarking cards takes just a little longer, but that's a small price to pay.

Tagrineth · May 15, 2003

Hey... about those nVidia 3DMark03 shots released: That kind of blurring... it means there's a NULL render going on, right? Cos I remember DOOM does the same thing if you use a no-clipping cheat to walk off the level...

darkblu · May 15, 2003

Tagrineth said:
Hey... about those nVidia 3DMark03 shots released: That kind of blurring... it means there's a NULL render going on, right? Cos I remember DOOM does the same thing if you use a no-clipping cheat to walk off the level...

it means the frmabuffer has not been cleared for at least as many frames as you can count parts of. but usually more.

RussSchultz · May 15, 2003

Evildeus said:
Little OT but
What's the use of Microsoft certification because it doesn't seem to help in anyway?

WHQL is more about security and stability, rather than correct function.

Microsoft isn't interested in whether your product looks bad--as long as it doesn't reflect on Microsoft.

Himself · May 15, 2003

Clashman said:
Himself said:

Do you need a benchmark for site to site comparisons? All you need is a benchmark that you can configure or set up like it's a brand new benchmark each time you do a shootout. If there were N camera paths for each test in 3dmark03 then NVIDIA and the like would have to do too much work to cheat around them all. At least using this one method. Once it's set up you can run it with the same data for each card, do up the graphs and make up some foolishness to say and you're done.

Click to expand...

It's just crazy enough that it might actually work!

You wouldn't even need to have multiple paths which could be taken. What might work even better would be to have a random path generator. Once the path to be taken in the benchmark was calculated, it could be saved and loaded up again when the new card was inserted. Thus you can have these random benchmarks that can't be calculated for ahead of time by hardware vendors, but that are perfectly repeatable.

Just do long as the work done is basically equivalent, you want variation but not variation in workload, that is why I suggested a set number of paths determined by the benchmark writer. But you could also do path segments and randomly branch them, that would have a similar effect.

boobs · May 15, 2003

Wow,

I posted a new thread with some suggestions:

http://www.beyond3d.com/forum/viewtopic.php?t=5876

But I guess this thread has already moved far beyond that.

The only thing I want to add is that the source code should be open source to allow public verification.

YeuEmMaiMai · May 15, 2003

Problem is ATi got caught and they actually fixed the problem and since that time there has been absolutly NO EVIDENCE that they are cheating in the driverset.

Nvidia is very dirty and underhanded in this case. Typical American company.....They think they know what is best for us just like the Big 3 in the late 60's to early 80's. No wonder a lot of us Americans drive Japanese cars... we are tired of being taken advantage of. Same thing needs to happen to nVidia before they will learn. We have to hit them in their pocket book.

AzBat · May 15, 2003

These latest developments have brought back some memories from my early days of benchmarking. I probably have a different take on solving the cheating and optimization than most anybody here. When I was doing benchmark testing for Jon Peddie Associates, I liked the idea of one entity doing the testing. As a market research firm we didn't necessarily cared who won or lost. Either way the losers hated us and the winner loved us.

Unfortunately being a market research firm, the companies in the industry were our clients. In my later years this lead to those companies hiring us to objectively test their products against their competitors'. They believed that by letting us do the tests our way, on our hardware and on the software of our choice that the results from the testing would appear legit. And to a certain extent I believe they we accomplished that even though a lot of people at the time believed we got paid to make them look good.

With all that said I still believe a single entity should be doing the testing. This entity's sole purpose should be to get impartial results and make sure that every product is not cheating. No company or individual should be able to influence their results. This means that the entity needs to be a non-profit organization. They should not be getting paid by any company or individual for any service or product they render or produce. This means that if the entity creates tools to do the testing that they can't sell them to make money. And they can't make money off the data or results from the testing.

Creating such an entity sounds easier than it is. The problem is picking the people that make up such an entity and what tools they will use to do the testing. We need some examples of other organizations that do almost the same thing even if they're in a completely different market or industry. SPEC(www.specbench.org) is the closest example I can think of right now. Anybody know of any others?

Also, I have to agree with somebody else who brought this up earlier. Sorry, I can't remember your name, but you basically said that making a benchmarking tool Open Source is not a very good idea. In fact, I don't believe any individual or company should have access to the tools that are used for testing. This way no company could optimize their drivers to make their hardware perform better in the test, but not in user applications.

I am at a lost if currently available games should be used for testing or if synthetic tools should be used. I have always thought both should be used, but it also depends on what the results will be used for. Most end-users are only going to be concerned with games that they use. However, OEMs will be concerned with overall performance and synthetic tool could do the job. So maybe I was right and that different tools should be used for different markets?

I'm starting to think that we as the end-user community need to take it upon ourselves to create a non-profit organization that holds these companies accountable for the claims they make with their products. If we don't then the companies will continually be free to do as they wish and I'm not willing to let them have that kind of liberty.

Who wants to help get the organization started?

Tommy McClain

OpenGL guy · May 16, 2003

AzBat said:
Also, I have to agree with somebody else who brought this up earlier. Sorry, I can't remember your name, but you basically said that making a benchmarking tool Open Source is not a very good idea. In fact, I don't believe any individual or company should have access to the tools that are used for testing. This way no company could optimize their drivers to make their hardware perform better in the test, but not in user applications.

This is a bad idea. If a company gets a bad result in a benchmark, then they'll never know what the problem is if they can't analyze the driver/hardware behavior while running the application. There are plenty of legitimate opportunities for optimization, but it helps (a lot ) if you know where a problem is.

Doomtrooper · May 16, 2003

Evildeus said:
Little OT but
What's the use of Microsoft certification because it doesn't seem to help in anyway?

It will with Longhorn, no more non-whql drivers..they simply will not be allowed to load.

http://www.winsupersite.com/showcase/longhorn_preview_2003.asp

Because Longhorn versions will rely much more heavily on graphics than previous Windows versions, Longhorn will not support unstable, unsigned drivers. If you attempt to install an unstable driver on Longhorn, the user experience will step back to Tier 1. "Hardware acceleration and high DPI scaling can not run on an unstable driver," Hammil said. "It must run 24/7/365."

And I can't wait, this leaking of Beta driver thing started years ago when Nvidia PR thought it was cool to load drivers weekly, hardly.

rwolf · May 16, 2003

Is there a way to see if the are doing this same cheat in quake, serious sam, ut2003 etc? I know that 3dmark03 has tools that helped to detect this issue, but how do we know if the other benchmarks have been tampered with.

Doomtrooper · May 16, 2003

I will eat my shoe if Quake 3 didn't have more hacks in it than Bapco Sysmark...from all companies.

As long as no image degradation is evident, then hey 10 fps is good..the problem is (and this is a fine line) is what is the purpose ??

Well we need to look at what the 'purpose' of posting Doom 3 or Half-life 2 benchmarks.

Benchmarks

UT 2003 Benchmarks

I keep seeing the excuse about games should be used, but IHVs will optimize those too, and in this example 'dancing pixels' are evident, so there is image degredation. So I don't consider that a good 'optimization'.

The bottom line is in the 'graph wars' these 'benchmarks' either synthetic or 'games' are designed to increase sales.
To state 3Dmark is not important, how many OEMS and even games state a certain '3Dmark' number to play.
It is so hypocrytical to even think that 3Dmark benchmarks are not important, we are talking about the most popular benchmark of all time.
I ran a company, thats all I heard from end users..I need a card to get me XX 3Dmarks...and finally when this benchmark is finally more a graphic card benchmark vs. a platform benchmark we have all this BS.

micron · May 16, 2003

YeuEmMaiMai said:
Problem is ATi got caught and they actually fixed the problem and since that time there has been absolutly NO EVIDENCE that they are cheating in the driverset.

Nvidia is very dirty and underhanded in this case. Typical American company.....They think they know what is best for us just like the Big 3 in the late 60's to early 80's. No wonder a lot of us Americans drive Japanese cars... we are tired of being taken advantage of. Same thing needs to happen to nVidia before they will learn. We have to hit them in their pocket book.

Typical American company?..well that is completely stupid of you to say.....if you have a problem with my country or how we do things, I would suggest you take your lame arse over to the [H] forums where B.S. is free to run rampant....

WaltC · May 16, 2003

It's one thing to legitimately optimize your drivers to run a game better on your hardware, but another thing entirely to premeditatedly analyze a benchmark to discover how to cheat it by way of essentially cutting huge pieces of it out so as to reduce the workload on your hardware so that the performance of your hardware appears much better than it actually ever could be even with legitimate optimization.

But in no way, shape, or form should we ever blame a benchmark simply because it is amenable to manipulation--because all benchmarks can be manipulated. Should we ban cars because people abuse them and kill other people--should we ban planes because hijackers crash them into buildings? I don't think so. Likewise, benchmarks aren't to blame--it's the companies who abuse them that are at fault.

Toasty · May 16, 2003

Doomtrooper said:
It will with Longhorn, no more non-whql drivers..they simply will not be allowed to load.
...
And I can't wait, this leaking of Beta driver thing started years ago when Nvidia PR thought it was cool to load drivers weekly, hardly.

Longhorn will still support non-WHQL drivers for the simple reason that the employees at ATI, Nvidia, Microsoft et al will need to test these drivers for WHQL certification. There will probably be some registry setting or Windows setting to enable it, but it will be there and will be relatively simple to do.

Doomtrooper · May 16, 2003

What about developer builds, easy enough.

Cheating and its implications

LeStoffer

Evildeus

Tim

gokickrocks

eSa

Tagrineth

murr

darkblu

RussSchultz

Professional Malcontent

Himself

boobs

YeuEmMaiMai

AzBat

Agent of the Bat

OpenGL guy

Doomtrooper

rwolf

Rock Star

Doomtrooper

micron

Diamond Viper 550

WaltC

Toasty

Doomtrooper

Similar threads