I've changed my mind. I think B3D should release whatever benchmarks they use.
Why? Think of all the incorrect benchmarks that have been posted to major sites in just the last couple weeks or so:
- Anand posted benches that claimed the 5900U was platform-limited (at 223fps!) running Q3 at 1600x1200 with 4xAA and 8xAF
- Lars at Tom's mislabeled the D3 Medium Quality + 4xAA benches as High Quality + 4xAA
- Kyle and Anand both ran D3 in Medium Quality with 8xAF set in the drivers, despite what seems to be the fact that Nvidia drivers interpret Medium Quality as forcing no AF, while ATI drivers do not
- ExtremeTech's 3dMark03 build 320 vs. build 330 benchmarks show the Radeon 9800P losing performance in all four game tests with the new build; benchmarks by forum members and Wavey himself demonstrate that ET got it wrong
The last one, of course, is the most troubling in this context. Imagine if ET were the only ones with access to build 330! The only conclusion to be drawn would be that ATI was cheating on game tests 1, 2 and 3, even though they are not.
But the only reason we know they're not cheating is because everyone else has access to the same benchmark and can doublecheck ET's results.
Now, I trust Wavey with a review, 110%. So far as I know, he's never messed up a benchmark. But he's still human; if he makes typos in his review text or forgets to change column headings in the result tables (and he has been known to do both on occasion), then he can make little mistakes benchmarking as well.
Of course, he wouldn't make any of the gratuitous errors I listed above. That's because Dave understands 3d performance characteristics a whole hell of a lot better than any of the other reviewers, so if he screws up and gets anomalous results, he knows to investigate further. Asking those other guys to catch their benchmarking mistakes would be like asking them to edit their review text for typos if they didn't understand English. (edit: typo
)
But even Wavey will probably miss subtle benchmarking mistakes, or those which give results which, though wrong, are what he expected going in. (There's some term for this sort of bias in scientific experiments, but I forget what it is; but basically, it's the fact that when an experiment gives you unexpected results you often run it again and again, whereas when it gives expected results, you accept them without question.)
As the NV30 fiasco taught us, sometimes our "expected results" end up totally wrong.
"But Wavey was the one who figured out that NV30 was 4x2 and not 8x1!" I know. But even that discovery relied on multiple people with multiple NV30s having access to the same benchmarking tool.
I trust B3D. But I think non-repeatable benchmarks are going to backfire in the long run.