My initial idea was proposing that the person evaluating could specify directly what was tested, record it, and play it back to benchmark among several cards. It was prompted by discussion of the 3dmark03 aniso test, I think...I consider it a problem that it doesn't automatically provide regular and easily reproducable motion (EDIT: that can be objectively applied for further comparitive testing) to allow easy comparison of aliasing in actual use.
My thought was originally of a scene designed to readily expose all pertinent issues at once, and then facilitate the reviewer/evaluater inputting keys and directing camera rotation around the scene. This could then be saved out to a file and then played back, and would be unique to each evaluation session and under the control of the person evaluating.
Further facets would be letting the person control changing parameters, like colors, light intensity, and other properties that could expose image quality shortcuts, and maybe animation properties (a "living" model with skeletal animation, etc) as part of this "recorded demo".
Wouldn't eliminate cheating opportunities, but open the amount of parameters that would have to be taken into consideration when cheating to a rather large amount of variance, hopefully making such optimizations impractical except as they would be general case optimizations.
Ilfririn's example is a shortcut that fits as an extension of what 3dmark is already doing, and could be applied to the criteria above as an option as well. I think user controllable is more important, however, to facilitate an association between the specific thoughts of the person evaluating and the testing, to account for possibilities not accounted for by the benchmark creators specifically, so as to not limit the possibilities and likelihoods that can be expressed. Reproducibility would be covered by the saving of the file, and the focus for such customizable testing would be for an option more focused on expressing the ideas from the game tests (which are separate) in such a way as to evaluate whether the game test results indicate special case "cheating", or not.
Of course, many games do this half of this already, but doing this for a synthetic test (EDIT: in this usage, a test designed not for actual gaming, but for the "synthetic" criteria of accurately reflecting what would be stressed by games) and with these types of controls opens up many new doors for testing and exposure, and optimizing that worked for this would atleast be
more likely to be both general case and truly invisible.
Can't find my original suggestion yet, though (the searching with asteriks still appears to have issues, like not being able to go to another page with one set of search results, so I gave it up for now). However, some
opinions I've expressed before seem somewhat related to my concerns in this regard (except that I've since been shown to be wrong about what nVidia was doing with their 3dmark 03 boosting driver set
).
EDIT: To address Rev's question, and Calavaro's one color texture example, that would be covered by allowing variance that would require the standard amount of colors in the texture to render properly in all cases, and highlighting the difference in performance and/or image quality between the game tests and the associated "Cheat Check" test associated with it, the latter being where such cheats can more easily be exposed.