My first choice would be what cho said: record new timedemos for each review/shootout, and release them with the review. That way the IHVs have no chance to optimize for that particular demo, but the results are still verifiable and repeatable by third parties (you know, that whole scientific method thing).
If this is too much work (and it doesn't
seem like it should be all that much work to me, but I really don't know), then yeah, the next best option is to keep your demos secret. I can't speak for everyone on this board, but I'm going to anyways: we all trust a B3D review 100%, not only to be conducted properly but to draw insightful conclusions from the data. But even so, sometimes just the appearance of transparency brings its own rewards.
Of course the
ideal option would be what has been mentioned in some other threads: a randomizable benchmarking utility. That is, based on a seed number, the utility would randomly generate, at the very least, a new "script" for the demo (by "script" I mean the path taken through the level plus whatever occurances take place: weapon fire, NPC actions, explosions, etc.); and possibly also randomized geometry counts or even randomized shader code. Results would still be comparable between cards, and verifiable by third parties: just use the same seed number. But this way there's no extra work: you just pick a new number for each review (and publish it).
Ironically, the best chance I can see for such a system anytime in the near future would be if Futuremark develops it, perhaps for 3dM04 or something. Unfortunately, the one false note on FM's part (IMO) in a day of otherwise saying exactly the right thing came when AJ wrote that
any attempts to special-case 3dMark are illegal and thus that the benchmark itself need not be designed to take possible cheating into account. His comment that because 3dMark is closed source he doesn't expect IHVs will try to reverse-engineer it struck me as security through obscurity, which is particularly naive in a situation like this one. And the whole attitude indicates to me that FM doesn't feel a randomized benchmark should be necessary (although perhaps he was just defending the decisions they made for 3dM03, and this incident might cause FM to realize the wisdom of designing one's benchmark with the assumption that everyone will try to cheat).
Another possibility to achieve a similar goal would be if the seed number were to seed existing AI bot code in a game (plus determine a random starting location); the bot would then just "play through" the game for a little while, and the benchmark would either be record the framerate immediately (to bench with AI, physics, etc.), or record a demo and benchmark that (to bench rendering speed only). This might not be too difficult for a developer to add to an upcoming game, assuming they care enough about preventing benchmark cheating to go to the effort.
But as either of those solutions is some time off...record your own demos, release if you have the time to record new ones, but keep 'em secret if you must.