HardOCP - Cheating the Cheaters

volt · May 24, 2004

What do you guys think of this?

http://www.hardocp.com/article.html?art=NjE2

Here very shortly you should see no â€œbenchmarksâ€ used in our video card articles at all. You are seeing the end of HardOCP â€œvideo card reviewsâ€ and the beginning of â€œvideo card gameplay evaluations.â€

Gamers want a solid gaming experience based on a level of game environment immersion that can only be afforded to them with solid hardware that produces the correct image on the screen. We at HardOCP decided that we needed to stop doing video card reviews and start evaluating the gameplay delivered by those video cards. The benchmarks we were using were simply not representative of gameplay that the gamer might experience.

We take the video cards in question and first and foremost play games on them, lots of the latest games. We have traditionally focused on first person shooters but will now be moving to games reaching into several categories. We are also pulling sales data from several sources that show us the top selling games every month. Once we play those games for a while and get a feel for performance levels at different Image Quality settings, we then find portions of levels in the game, which we would qualify as â€œintensive.â€ We then use that portion of the gameplay for our frame rate data that you see graphed in our evaluations.

Some random quotes from the article.

bloodbob · May 24, 2004

Well this opens up a whole new world of allowing the reviews to be biased doesn't it because we can get no objective facts :/ Seems a bit like a cop out "its too hard to reviews cards because everyone keeps on cheating".

THe_KELRaTH · May 24, 2004

Well this is really how it should have been right from the beginning - or do you buy your TV based on the FPS numbers?
Video card manufacturers should be forced to provide product that meets a certain minimum Hz setting - no more 300fps then dropping to 20. This is an area where technology is going backwards rather than forwards.

volt · May 24, 2004

While I agree with most of Kyle's points, I think it's somewhat taking the easier way out when evaluating video cards (on game experience only) -- is it not easier to run FRAPS than to explain the technology behind a GPU? For that you'd definitely want to use some synthetic benchmarks.

FrgMstr · May 24, 2004

bloodbob said:
Well this opens up a whole new world of allowing the reviews to be biased doesn't it because we can get no objective facts :/ Seems a bit like a cop out "its too hard to reviews cards because everyone keeps on cheating".

In terms of real-world time and resources I would guess that at first we spent close to 500% on this new format compared to the old. Doing the current evaluations is still very much more costly and time consuming than doing it "the old way." I hired Brent full-time as a result of how demanding this type of testing is. It simply was not going to be done properly in his spare time.

FrgMstr · May 24, 2004

volt said:
While I agree with most of Kyle's points, I think it's somewhat taking the easier way out when evaluating video cards (on game experience only) -- is it not easier to run FRAPS than to explain the technology behind a GPU? For that you'd definitely want to use some synthetic benchmarks.

I think we have been very specific about who our evaluations are catering to. We have gone so far as to split up the tech articles and gameplay articles as our statistics have shown there to be two very distinct crowds. One that wants to know why their video card does what it does, and the other just wants it to play their games well. Again, the very limited synthetic benchmarks can be easily found elsewhere and I promise to supply links to those sites on a daily basis.

Hanners · May 24, 2004

FrgMstr said:
In terms of real-world time and resources I would guess that at first we spent close to 500% on this new format compared to the old. Doing the current evaluations is still very much more costly and time consuming than doing it "the old way." I hired Brent full-time as a result of how demanding this type of testing is. It simply was not going to be done properly in his spare time.

I think you've struck upon possibly the biggest 'problem' (if you can call it that) of the real-world evaluation approach - It takes time, and a lot of it. From my own experience of doing this for a review in my spare time, it took nearly a month to complete all the testing to my satisfaction. Most sites don't feel they can afford to take that long for a new product launch, nor can they afford financially to hire full-time staff.

Aside from that though, I agree almost entirely with [H]s current approach of real-world experience testing, mixed in with a little 'apples to apples' (or rather, as close as you can come to that) testing to round things off.

I do however think that synthetic benchmarking does and will continue to have a place in testing every facet of a graphics card - Right now it maybe isn't such a big issue due to the architectures involved, but once we roll around to a whole new DirectX revision again we'll be in a state where we have the API, but no games that use it. Surely then, isn't it worth considering using a synthetic test for the featureset of that new API (assuming one becomes available), rather than simply ignoring the new technology altogether?

Bouncing Zabaglione Bros. · May 24, 2004

I think it's one way of dealing with the fact that so many cheats and IQ sacrifices are giving us reams of numbers that don't have much bearing on the actual gameplaying experience. I think Kyle is still wrong for picking out synthetics in particular, as it's been shown time and time again that in-game benchmarks suffer from the same problems.

It looks like we get back to the days when people play with and use a product before giving us their opinions on where it is good and bad, instead of just running a suite of programs to produce a load of graphs that have effectively been knobbled by IHVs.

Now we just need to find the reviewers you trust to give objective opinions and useful/accurate analysis on what they experience.

THe_KELRaTH · May 24, 2004

I would also like to see far more information about how smooth gameplay is especially with games that have alot of sideways scrolling /panning. For instance if the FX5900 is running at 40fps and the X800Pro is running at 60fps which is really smoother ingame. It wasn't that long ago that 30fps gave you perfectly good scrolling/ panning smoothness (Voodoo 1) but now 30 = stuttering on my 9700 / 9800Pro.

jvd · May 24, 2004

THe_KELRaTH said:
I would also like to see far more information about how smooth gameplay is especially with games that have alot of sideways scrolling /panning. For instance if the FX5900 is running at 40fps and the X800Pro is running at 60fps which is really smoother ingame. It wasn't that long ago that 30fps gave you perfectly good scrolling/ panning smoothness (Voodoo 1) but now 30 = stuttering on my 9700 / 9800Pro.

that is because you adapt to the new res .

Just like when you have a bike and your pedaling down hill even going 5-10 mph feels fast .

You get in a car adn that 5-10 mph doesn't feel as fast any more does it ?

Or its a hot day out and u jump in a cold pool after your body adjusts its not that cold anymore is it .

That is just how we are .

30fps on your 9700pro/9800pro feels jerky cause you were used to 60+

But on your geforce 4 you were used to the 30fps as that was what you were playing untill u upgraded.

THe_KELRaTH · May 24, 2004

Honestly I'd like to think that is the case but I notice the difference immediately not over months of use. Actually I'm looking forward to REAL fluidity like that found using Atari 800's, Amigas, etc

jvd · May 24, 2004

THe_KELRaTH said:
Honestly I'd like to think that is the case but I notice the difference immediately not over months of use. Actually I'm looking forward to REAL fluidity like that found using Atari 800's, Amigas, etc

if u want that then you ahve to go to a console .

I don't notice any problems with my 9700pro. I get 30+ fps all the time . You might want to back off the res or the aa and see how it works for you. YOu may also need a faster cpu as your minimum fps is mostly affected by your cpu

nutball · May 24, 2004

THe_KELRaTH said:
Honestly I'd like to think that is the case but I notice the difference immediately not over months of use. Actually I'm looking forward to REAL fluidity like that found using Atari 800's, Amigas, etc

I think those days are gone, at least until Intel buys out AMD, NVIDIA and ATI

Those older machines were a known architecture that a developer could write to, and the platforms were stable for quite long periods of time (years). The modern PC market just doesn't seem to work that way.

demalion · May 24, 2004

my own re[h]ash

The questions for the methodology is one of actually following through with the implications of the limited scope of the approach, and concentrating the presentation on being clearly focused on its strengths. Things seem on track for AF and AA, but look to be opening a new can of worms as far programmable featureset (we are already seeing some of this, in terms of increased fragility of patches and new game content).

What I am most concerned about is certain statements and representation in the article itself, that indicate to me that the basis on which decisions relating to this matter are going to be made are fallacious...

...

It spends a great deal of time on what I can only evaluate as revising the history of what occured when [H] allowed, paraphrasing, "cheaters to cheat [H] and it's readers". The resultant description of events bears little resemblence to what actually happened or to actual justification for some statements it makes.

According to the article, "synthetics" led them astray concerning cards in the past, yet for one example of "enlightenment" on the matter it mentions the article by Extremetech that exposed the truth...inconvenient for the conclusion drawn about synthetics, Extremetech exposed it using "synthetic" testing. How synthetics were discredited by the event were when sites, most prominently [H] itself, attacked the Extremetech article and said what the synthetics demonsrated were false. The article somehow fails to convey that accurately to me.

According to the article, colored mip map levels can lead you astray concerning filtering issues, like in UT2k3, but in the actuality of that issue, colored mip maps, in fact, were used to illustrate the issue. It was [H]'s own specific decisions in selecting textures and still screenshots to erroneously state there was no difference, and to ignore any issues with colored mip maps, that led people astray, when both synthetics and other decisions on where to look for in game issues did not.

This is a central inconsistency...the actual blame for [H]'s errors around the issues the article represent as enlightening, and 'naturally' leading them to the conclusion to not use synthetics, was the site's own decisions (decisions to ignore synthetic evaluations, actually). At the time, other people weren't making the same decisions, and the error of the decisions were fairly obvious to others even as [H] was making and defending them. The tools that actually sparked the opportunity for enlightenment in those issues...were synthetic tools.

From this, it concludes and propose that the problem is "synthetic" tools failing to represent actuality, and that getting rid of them removes the issue!? This isn't supported by actuality, only by the selectivity and/or revisionism of what happened. :-?

What it actually shows is that thought, investigation, and standards have to be applied to any information gathered. Correcting a lack in that helps prevent people being led astray, distorting where the error was in the past does not.

...

Here is some fairly simple logic that some of the article seems to contradict...

Comparing synthetics to other data and determining where they fit is more thought and investigation than not doing so. That is, for example, how many people knew how the NV30 and NV35 compared for PS 2.0 far before [H] informed them, and that's where [H] failed in directly stating that the people saying so were wrong/motivated by sour grapes/lying just to hurt nVidia (these actions seem to have been forgotten for analyzing how people could have been led astray :-?

). This is how [H] failed on the matter of readers being "cheated by the cheaters" and their stance on 3dmark 03...by simply looking away from the information. They even further assisted in leading people astray by mirroring nVidia's PR statement of "don't look behind the curtain" so closely, and I think linked it as justification for the article...as justification for their decisions, not even as an illustration of it being a mistake to echo IHV PR nearly verbatim, point for point, at full article length. :-?

Ignoring synthetics measurements by not looking at them is less thought and investigation, not more. This isn't because of the places where such a policy of less investigation happens to overlap where synthetics fail, it is because of the places where the policy simultaneously overrides where synthetics do not fail, and ignores where the reviewers selection and perception do fail to apply for the user. Again, the above examples for the article's revelations about where synthetic criteria fail are actually cases where [H] decided to ignore synthetics and led readers astray!

...

Moving away from the article's issues, and onto concerns on certain aspects of the the outlined methodology alone:

Examining only the most popular games? That's exactly where IHVs would have financial interest in engineering a representation that doesn't reflect the hardware's general abilities. It gives them the maximum opportunity for fragile corner cutting, and moves backward from progress that has been made to encourage more general "cut corners" that can actually survive scrutiny in the general case.
A general case "cut corner" is far more likely to be an optimization than a cheat, and a fragile "cut corner" is far more likely to be a cheat than an optimization.

Also, how will it serve any readers when they might happen to play a game (or several) that weren't popular enough to be represented in the selection of top selling? Actually, that question sparks a bit of deja vu...this situation for representation is a return to the aspect of many reviews being more a review of the effort spent by the IHVs on targetting the most popular games.

Finally, it seems to actively encourage the bad aspects of the various IHV marketing campaign initiatives, by giving maximum exposure to the picture IHVs can manage to engineer with them.

However, if you're going to limit usefulness like many reviews have always done, it is better to do it well and educate your readers about what you're doing. Where the article relates to explaining that this might be the outcome, instead of trying to vindicate the site and revise the context of past failures, it is an encouraging sign.

Tim Murray · May 24, 2004

Aw, crap.

The problem with this is that it is infinitely easier to cheat in games without being caught than it is to cheat in synthetic benchmarks. Synthetic benchmarks are predictable; games are rarely, if ever, predictable based on paper specs (okay, you can determine which card should come out ahead, but you don't know how much a card should lead or what a card's score should be without a previous score as reference). Synthetic benchmarks are an integral part of testing new cards; how else could we test the features not yet used in any games? How could we have even had an idea of the NV3x's lackluster PS2.0 performance without synthetic benchmarks that came out within a month or two of the launch? PS2.0 games weren't out until what, six months later, so it seems counterproductive to declare that synthetic benchmarks are too prone to cheating and rely on games that aren't necessarily an indicator of future gameplay performance (UT200x? it's GPU limited with X800 at, what, 1600x1200 with 6x or higher AA?).

Like I said, it's easier to cheat in games than synthetics (shader replacement comes to mind, and I think testing Half-Life 2 will make your life a living hell because of all the shader replacement that NVIDIA will almost undoubtedly perform). A balance is needed between synthetic benchmarks and gaming performance (however you decide to interpret that, whether it's from an actual gameplay run that you know well or a recorded benchmark), and I fear that [H] could be setting a dangerous precedent.

Bouncing Zabaglione Bros. · May 24, 2004

The Baron said:
Like I said, it's easier to cheat in games than synthetics (shader replacement comes to mind, and I think testing Half-Life 2 will make your life a living hell because of all the shader replacement that NVIDIA will almost undoubtedly perform). A balance is needed between synthetic benchmarks and gaming performance (however you decide to interpret that, whether it's from an actual gameplay run that you know well or a recorded benchmark), and I fear that [H] could be setting a dangerous precedent.

While I agree with what you are saying about synthetics, it's obvious that [H] are trying to get around the cheating question by a completely different approach.

There are plently of people that will continue to use synthetics, but most people will either not interpret them correctly, or be caught out by illegal optimisations, lessening their usefulnes.

In the example you give above, I would expect [H] to say that NV40 is just as fast as X800, but the shaders look like crap because they have all been replaced by low precision versions. If the replaced shaders don't look any different, then I guess that would be a vailid gaming optimisation for that particular title, bearing in mind that any such performance gains could be nullified by the next developers patch until Nvidia's shader replacement team catches up.

If, on other similar games NV40 shader performance is significanty worse (due to the lack of hand coded shader replacement) then this should also come up in the review and count against NV40.

Geo · May 24, 2004

Kyle has taken a whole lot of shit the last year or so --and certainly some of it he brought on himself either by not explaining well enough or getting too defensive about what he was trying to do. In this world when you try to turn the battleship there will be a lot of resistance. Having said that, looking back over that period, and seeing that there was a bit of flailing about, it seems to me that he has been remarkably consistent in what it is he is saying and the ultimate goal he is trying to achieve.

I may not entirely agree, but it seems those who've wanted to pigeon-hole him as the plaything of this IHV or that in the exigencies of the moment should look at the bigger picture over time and see he's been saying the same things no matter whose ox is getting gored today.

THe_KELRaTH · May 24, 2004

nutball said:
at least until Intel buys out AMD, NVIDIA and ATI Those older machines were a known architecture that a developer could write to, and the platforms were stable for quite long periods of time (years). The modern PC market just doesn't seem to work that way.

hehe maybe Bill will have a brainstorm one breakfast morning... "Hell, I'll buy the whole bloody industry!"

....

With the Atari / Amiga's the framerate was preset and you could only run data within that cycle - no missing frames.
Of course there was a company that I believe did solve this issue to a large extent on the PC by sending a duplicate frame so graphics motion appeared smooth. (not to be confused with double/triple buffering).
I understand the technique was dropped because when using synthetic benchmarks it didnt fare too well as the results were only based on the primary frame. Reviewers only looked at FPS numbers rather than the more important motion smoothness.

karlotta · May 24, 2004

Good for the review world to have differnt ways to test cards. The only real problem is trying to be first to publish.. you cant this way. For cheats in games i didnt think reviews where for finding cheats.. its only been the last couple of years or less that it has become the MAIN thing in reviews (fuk u Nvida). Maybe EB will have a superset of reviews, numbers and styles, like the DOWjones and give a overall score based on five real good reveiw sites. Go for it Hannners.

thatdude90210 · May 24, 2004

This is a good start, it's almost as if somewhere along the line, 0-60 times became the most important factor in deciding the best cars. Luckily, car magazines like Car & Driver, Road & Track et al, did not take the same route that video card reviewers did -- lived and died on benchmark numbers.

HardOCP - Cheating the Cheaters

volt

bloodbob

Trollipop

THe_KELRaTH

volt

FrgMstr

FrgMstr

Hanners

Bouncing Zabaglione Bros.

THe_KELRaTH

jvd

THe_KELRaTH

jvd

nutball

demalion

Tim Murray

the Windom Earle of mobile SOCs

Bouncing Zabaglione Bros.

Geo

Mostly Harmless

THe_KELRaTH

karlotta

pifft

thatdude90210

Similar threads