B3D custom demos not availed to public -- Agree?

LeStoffer · May 24, 2003

Yes, I agree because I trust B3D

But a couple of important points to make: Dave hasn't voiced his opinion yet, there is the issue of whether other review site trust beyond3d enough to respect the benchmark scores and finally we have a problem with comparing scores from product to product since you guys refuse the make any shoot-outs.

Marc · May 24, 2003

Hi,

For HFR i've made my own demo for UT/SS2/Q3 last week ... and i will not provide them to the public. It's sad to have to make this decision but i think it's the best one.

But i think i will also put some in game demo result, for comparison (more work :| )

Evildeus · May 24, 2003

Voted 1

Gnep · May 24, 2003

Voted #1. But also see my post here:

http://www.beyond3d.com/forum/viewtopic.php?p=121132#121132

For suggestions (basically I would like you simply to not publish any FPS numbers at all, but report how high you could push all settings for a game/card combo and it to remain eminiently playable.

An addition would be screenshots of the game at those particular settings.

Nite_Hawk · May 24, 2003

I voted no. I trust b3d, but I think this is entirely the wrong way to solve the problem. Instead, I propose the following:

The engine for the benchmark is made available. random paths are taken in the engine based on a seed value as was discussed in humus's thread to avoid clipping issues. In addition, a tool is written that can measure the differences in the resulting image compared to the dx9 software rasterizer for every frame in the image. This way, a statistical comparison can be performed.

At this point, beyond3d has a couple of options.

1) It could not distribute the shader code used for benchmarking with the engine. This way the engine is available, but sites wishing to benchmark with it would have to write their own shaders.

2) They could distribute the shaders, but change them in specific ways (perhaps color values) at regular intervals to make sure they are still the ones being run.

3) They could try to find some other way to make sure that new shaders arn't being inserted. A statistical image analysis would atleast help make sure non-equivelant shaders weren't being inserted.

Nite_Hawk

Reverend · May 24, 2003

Nite_Hawk, I was talking about available games, not custom synthetic benchmark/demos.

Nite_Hawk · May 24, 2003

Reverend:

Gah! I missed that one little important word!

I think using custom game demos would be perfectly fine. The engine is still available for people to perform their own testing. Over a number of different demos, you should get fairly close results anyway, even if they arn't exactly the same.

Nite_Hawk

Joe DeFuria · May 24, 2003

I voted yes.

However, just to be clear, I would like to see the "traditional" demo scores, along with the "not released to the public" B3D demos. This is for the sake of B3D's reputation. (More work...yup!)

As long as the traditional demo scores reasonably correspond to what other sites have for their runs of the same demo, then that should eliminate a lot of "you didn't set up the computers right" questions that are bound to arise if the "custom" demos don't show the same results.

WaltC · May 24, 2003

I voted yes, with the proviso that B3d guarantee that not only the public can't get it but it won't be leaked to anyone outside B3d, such as other web sites or, of course, hardware makers--which would guarantee that it couldn't be cheated.

My first preference would be releasing a public demo with lots of seed values as to camera track and other conditions (as others have suggested)--but this probably isn't realistic because of the work involved. But if it was it would be ideal, since anyone at home should be able to plug in the seed values used at a given time and come up with comparable results.

Pete · May 24, 2003

Marc, good call. Now you just have to use FRAPS to work in some non-FPS games for benchmarking.

Obviously that would increase your work dramatically, so I can't really demand you do so, just point out the obvious, that it would be very nice if you did. After all, we saw from the 3DVelocity review that ATi's AF can be *much* better than nVidia's in some racing sims.

Nazgul · May 25, 2003

In the general case, creating custom, non-disclosed timedemos for game benchmarks is a dangerous tactic, as it leaves a lot of room for abuse on the part of the website. However, for a site like Beyond3D I would definitely be in support of it. B3D's got an excellent reputation for fairness when it came to reviews. Personally, I think the ideal situation would be a mix the most popular public timedemos of said games(i.e. whatever you're using now for games like UT2K3, SS:SE, Q3, etc.) and custom-built timedemos. I definitely think screenshots of the private demos should be provided so at the very least we can get an idea of which parts of the game are used. Also, if the private demo shows a particular card to be significantly underperforming compared to where the IHV feels it should be, they can use the screenshots to help narrow down where the problem might be occurring.

I also think it's important that in a situation like I described, that B3D be willing to cooperate with an IHV who feels their scores are due to a driver bug and ask for a copy of the timedemo to analyze their card's performance more closely. Of course, it might become necessary at this point to "expire" that timedemo from the testbed and create a new one, but I think it's only fair to the IHV that they be given the opportunity to address any potentially legit problems with their drivers that the private demos might highlight.

It's quite clear that using customized timedemos could potentially make for a lot more work on the part of reviewers if they wanted to maintain the integrity of their scores. Whether it's worth it is something that only guys like Rev and Dave can decide for themselves.

OpenGL guy · May 27, 2003

Reverend said:
With the current cheating issue and dubious optimizations going on, will you object to B3D making their own demos and use them for its reviews but do not make them available to the public nor even say what level such demos are based on?

We will use the same demos for all our benchmarks of course.

Note that Dave hasn't exactly made any decision on anything about the future of reviewing at B3D so this poll of mine is just an expression of my personal curiosity without Dave's consent. What I'm suggesting here doesn't necessarily agree with what Dave may have in mind -- Dave runs the site and makes the decisions.

PS. This is a quick post/poll so if you have any other options you'd like to see, mention them.

I haven't read the thread (tough to catch up on 1000s of messages after a holiday weekend), but I would like to see the following:
- B3D keep demos to itself: Good, sound idea that prevents hanky-panky with known camera paths
- feedback to IHVs in case there is some discovered performance problem ("Did you know that your performance drops off a cliff if you do X?"): This kind of feedback can really help out because internal testing can only do so much.

Just my thoughts.

Ailuros · May 27, 2003

Yes in the poll and yes to OGLguy's idea 8)

Dave H · May 27, 2003

I've changed my mind. I think B3D should release whatever benchmarks they use.

Why? Think of all the incorrect benchmarks that have been posted to major sites in just the last couple weeks or so:

Anand posted benches that claimed the 5900U was platform-limited (at 223fps!) running Q3 at 1600x1200 with 4xAA and 8xAF
Lars at Tom's mislabeled the D3 Medium Quality + 4xAA benches as High Quality + 4xAA
Kyle and Anand both ran D3 in Medium Quality with 8xAF set in the drivers, despite what seems to be the fact that Nvidia drivers interpret Medium Quality as forcing no AF, while ATI drivers do not
ExtremeTech's 3dMark03 build 320 vs. build 330 benchmarks show the Radeon 9800P losing performance in all four game tests with the new build; benchmarks by forum members and Wavey himself demonstrate that ET got it wrong

The last one, of course, is the most troubling in this context. Imagine if ET were the only ones with access to build 330! The only conclusion to be drawn would be that ATI was cheating on game tests 1, 2 and 3, even though they are not. But the only reason we know they're not cheating is because everyone else has access to the same benchmark and can doublecheck ET's results.

Now, I trust Wavey with a review, 110%. So far as I know, he's never messed up a benchmark. But he's still human; if he makes typos in his review text or forgets to change column headings in the result tables (and he has been known to do both on occasion), then he can make little mistakes benchmarking as well.

Of course, he wouldn't make any of the gratuitous errors I listed above. That's because Dave understands 3d performance characteristics a whole hell of a lot better than any of the other reviewers, so if he screws up and gets anomalous results, he knows to investigate further. Asking those other guys to catch their benchmarking mistakes would be like asking them to edit their review text for typos if they didn't understand English. (edit: typo

)

But even Wavey will probably miss subtle benchmarking mistakes, or those which give results which, though wrong, are what he expected going in. (There's some term for this sort of bias in scientific experiments, but I forget what it is; but basically, it's the fact that when an experiment gives you unexpected results you often run it again and again, whereas when it gives expected results, you accept them without question.)

As the NV30 fiasco taught us, sometimes our "expected results" end up totally wrong. "But Wavey was the one who figured out that NV30 was 4x2 and not 8x1!" I know. But even that discovery relied on multiple people with multiple NV30s having access to the same benchmarking tool.

I trust B3D. But I think non-repeatable benchmarks are going to backfire in the long run.

stevem · May 27, 2003

But I think non-repeatable benchmarks are going to backfire in the long run.

I agree. Reproducible, verifiable methodology & results are the only road to transparency. I also baulk at the potential notion of a cadre of trustees with access to internal benchmark(s) for verification purposes...

Bolloxoid · May 27, 2003

There is absolutely no need to get paranoid and start withholding timedemos. The only thing needed is a large enough number of recorded timedemos used across review sites so that it would not be worthwhile to create special code for any particular one in the drivers.

Xspringe · May 27, 2003

Bolloxoid said:
There is absolutely no need to get paranoid and start withholding timedemos. The only thing needed is a large enough number of recorded timedemos used across review sites so that it would not be worthwhile to create special code for any particular one in the drivers.

I totally agree with you. Security through obscurity isnt going to work

indio · May 27, 2003

Make a " reference " demo and make a demo similar to the reference that is publicly available.

Joe DeFuria · May 27, 2003

Xspringe said:
I totally agree with you. Security through obscurity isnt going to work

I have to disagree with that. Again, I see it as a "good thing" to not publically release a home recorded demo, as long as you also test the "typical prevailing" demos for a reference.

pascal · May 27, 2003

I voted number 1 8)
My sugestions:
- Explain what you are doing but dont give the bench programs
- Keep in touch with IHVs
- Do some looong random benchmarks (lots of work)
- Do not use synthetic benchmarks

B3D custom demos not availed to public -- Agree?

Would you agree to B3D using custom game demos in reviews but not provide them to the public?

Yes, I agree because I trust B3D

No, I don't agree because I do not trust B3D

No, I don't agree not because I don't trust B3D but because I want to test myself

I am undecided at the moment and would like to wait-and-see how it all turns out

LeStoffer

Marc

Evildeus

Gnep

Nite_Hawk

Reverend

Nite_Hawk

Joe DeFuria

WaltC

Pete

Moderate Nuisance

Nazgul

OpenGL guy

Ailuros

Epsilon plus three

Dave H

stevem

Bolloxoid

Xspringe

indio

Joe DeFuria

pascal

Similar threads