If you dig deep, you will find old threads about this very topic discussing "IQ tests" in previous release benchmarks.
The discussion really focused on a huge number of factors and lead to much debate.
In previous versions of 3DMark, IQ tests would compare the reference raster (which was basically identical to the GF3's renderer from it's DX8.0 roots) with stills taken from the benchmark and grade accordingly.
This basically made it's "IQ score" based on how close to the reference rasterizer the underlying hardware could get. With no regards to color accuracy, truncation or other issues.
Unfortunately, this lead many websites to try and create arguments to "dock" IQ on 3D cards that were delivering what many would consider far superior IQ. So the debate really became- what was the developer's intention? And did the rasterization process and 3D hardware deviate, enchance or degrade the final quality from the original concept?
More on the website thing. The easiest way to explain the farce of the time is through a good illustration:
If these are shaded renderings used in the IQ comparison tests such as 3DMark employs, you can see that IHV B would "fail" the image quality tests for reasons of employing higher color accuracy and delivering better gradients. You would have test results failing IHV-B, while giving a perfect "A" score to IHV-A.
After all in this example- IHV-B deviates greatly from the reference rendering, so therefore it fails.
So it's a much bigger topic once you start to scratch the surface.