HardOCP - Cheating the Cheaters

Kyle/Brent:

One of the problems I see with your approach is that you, as test subjects, will be somewhat tainted. This is a problem we ran into a lot of times when I worked in a cogsci visualization lab. Most of the people in the lab had worked with so many different vision related projects (3D graphics, optical illusions, etc), that they had been trained to observe or ignore certain things in the scene. Things the vision researchers could pick out may go un-noticed by the "average joe". but in other cases things that would look unnatural to the average person may no longer look unnatural to the researcher.

You for example, may be able to tell the difference between bilinear and trilinear filtering, but how much does it *actually* matter to the end user, and in what situations? At the same time, does it matter if one card is getting about 60fps on average and another one is getting 70? What if the one getting 70fps actually looks worse than the one getting 60fps due to refresh/tearing issues? Will you test with vsync on? Will the reviewer know the actual framerates from the games? How will you control for bias in what the reivewer "knows" the framerate should be, versus what they "feel" when running the game?

I don't want you to get me wrong, what you are doing definately has merit. The problem you may run into is, that your data may be accidently falsified due to problems with the observer and methodology used. You'll need to be really careful about controlling variables, and if you really want your results to be good, you should perform the tests double-blind, and do so on a number of "average" observers rather than just a single reviewer like brent. I'm sure you could probably get university students to be subjects for $5-$10 an hour if you really want something publishable.

Anyway, I'll be interested to see where this goes. :)

Nite_Hawk
 
Nite_Hawk said:
I don't want you to get me wrong, what you are doing definately has merit. The problem you may run into is, that your data may be accidently falsified due to problems with the observer and methodology used. You'll need to be really careful about controlling variables, and if you really want your results to be good, you should perform the tests double-blind, and do so on a number of "average" observers rather than just a single reviewer like brent. I'm sure you could probably get university students to be subjects for $5-$10 an hour if you really want something publishable.

I don't think this approach is supposed to be objective (like benchmark data), it's supposed to be subjective. Just like movie reviews, you have to find reviewers who's opinions you trust and generally agree with.

You don't need double-blind tests because the idea is to do an opinion piece review, just as many other types of reviews are basically one person's (hopefully well informed) opinion.
 
Bouncing Zabaglione Bros. said:
I don't think this approach is supposed to be objective (like benchmark data), it's supposed to be subjective. Just like movie reviews, you have to find reviewers who's opinions you trust and generally agree with.

You don't need double-blind tests because the idea is to do an opinion piece review, just as many other types of reviews are basically one person's (hopefully well informed) opinion.

The problem with this then, is that it's impossible to really make any kind of qualified statements about anything. Sure, Brent can say "When I used these two cards, X played better than Y, but Y looked better than X", but Brent might not be a good person to test since he:

1) likely knows what the configuration is before making his subjective analysis. Though consciously I don't think he'd introdice bias, he'd be better off testing other people.

2) He may (probably?) know the recorded framerate scores when deciding how the gameplay "feels". Again, I don't think he'd consciously let this affect his judgement, but it again is a factor that can introduce bias.

3) while observant and unusually diligent (like Dave), Brent like all people is going to make poor observations or judgements from time to time. Taking a larger sample of people (even 3-4 people) would help alleviate this.

4) Brent is not necessarily a good representative of the readership that is less interested in the technical aspects of the game simply because like most of us, he is interested in those things. The average reader may never have noticed that trilinear filtering is not applied to all texture stages (I still can't tell the difference most of the time). Logic dictates that a card that applies trilinear on all texture stages should look better than one that does not, and Brent knows this (along with a lot of other knowledge) and again, this kind of knowledge can make you see things you don't see, or not see things that are there. It's really important to get a wide range of subjective opinions to get a better idea of what is going on.

Edit: And especially important that those opinions come from people who don't know anything about what is producing the video/images!

Nite_Hawk
 
Nite_Hawk said:
The problem with this then, is that it's impossible to really make any kind of qualified statements about anything. Sure, Brent can say "When I used these two cards, X played better than Y, but Y looked better than X", but Brent might not be a good person to test since he:

All very good points, but you're coming from the starting point of the current "stacks of benchmark" reviews that we now get. Short of getting a card yourself to test, the idea of these new [H] reviews is that you rely on Brent's judgement, opinions, analysis, and his general ability to give you all the information you are talking about.

It's up to you to decide whether you trust Brent (and his abilities), much in the same way as you decide to trust what a car, movie or music reviewer says about that product.
 
I agree with most of what is posted in the article. The only thing I really want is my [H] forum back! heh.
 
..crazy idea man

There should be blind tests in reviews/articles but this requires at least 3 people (one to swap vid cards and launch applications) and be the 'control', and two to test....visual differences and 'gameplay' differences.

Crazy idea .. unfeasible for every single review but how about one article like this once a year? I dont think B3D's audience would find it as useful as say the audience of THG, ATech or OCP.
 
jvd said:
Just like when you have a bike and your pedaling down hill even going 5-10 mph feels fast .

You get in a car adn that 5-10 mph doesn't feel as fast any more does it ?

Or its a hot day out and u jump in a cold pool after your body adjusts its not that cold anymore is it .

That is just how we are .

30fps on your 9700pro/9800pro feels jerky cause you were used to 60+

But on your geforce 4 you were used to the 30fps as that was what you were playing untill u upgraded.

This is somewhat on a tangent, but I strongly disagree here. It has nothing to do with being used to 60fps or 100fps or whichever, it has to do with consistency. FRAPS, benchmarks, etc., all tend to give you the average framerate number. Looking at graphs in some of the benchmarks for newer games such as far cry, you've got fps averages ranging from 5-10fps up to 100fps on the same level, depending on what's in the frame.

The fact is, "30 fps" in most cases never means one frame every 30th of a second. It just means 30 frames rendered within the last second -- if that means that 20 of those frames made it out in the first 10th, and the other 9/10ths of the second was spent rendering 10 frames, you still have "30fps", but visually you don't have a smooth gameplay experience.

So, I think the [H] approach will be good overall, as long as they can come to some sort of concensus for what qualifies for, say, "smooth" "very smooth" "jerky" etc.
 
This is probably way below seemingly most of the people's knowledge
on here...(seems some intelligent guys in here...mostly).
But...

With all the articles...hacks..."optimizations" recently, I personally have
just stopped using 3DMark03, as I believe it's not a wholly credible
vid-bench anymore. Too much controversy for me to place any trust in
it. Shame...and nothing personal btw.

I use Aquamark03...haven't heard anything bad about it personally.
Fraps...Sandra (not a bench for vids...but good for alot of other
testing)

Any reason why Aquamark doesn't recieve the bench-coverages that
3DMark03 still recieves?

:?
 
BMG_Cya=- said:
Any reason why Aquamark doesn't recieve the bench-coverages that 3DMark03 still recieves?

:?
Yeah, I never really liked it....that and I've always suspected it of being the very worst of the optimization offenders.
 
Aquamark 3 also appears to be detected by Nvidias drivers, but there hasnt really been much investigation into it.
 
Back
Top