Perhaps you could quantify image accuracy, but that is not equal to image quality....
Agreed. I think we should expand on this and we can help nail down and focus this discussion. I see two different things being discussed here, and I'll henceforth term them:
1) Image "correctness"
2) Image "pleasantness" (Please...someone come up with a better word!)
I would say that put BOTH of them together, and you get what we call "overall image quality".
"Correctness" (accuracy) is more or less something that can be objectively quantified. We would require some reference image or scene that can be agreed is "100% accurate representation of the ultimate quality scene". In reality, that's an analog representation. In practice, it's going to be at any given resolution computed with "very high" color and z accuracy, and "very high" full scene antialiasing and advanced texture filtering.
"Pleasantness" is a purely subjective measurement of how acceptable the image is. Given the same image or scene rendered by two different pieces of hardware "at the same settings", different people can reach different conclusions about which one is more "pleasing." Given different implementations of certain features, rendering accuracy, trade-offs, etc.
Now, since pleasantness is subjective, overall image quality cannot be 100% objective, though there is an objective component.
I think the important thing to keep in mind, is that "correctness" basically defines whether or not two images are even comparable in the first place. If two images are not "close enough" in correctness, they should not be directly compared. At that point where two images are sufficiently of the same correctness, "pleasantness" then pretty much determines overall image quality.
That prevents something like one person "prefering" point sampled graphics because they are "sharp", vs. filtered graphics. We all agree that's not an apples to apples comparision, yet one cannot argue someone's opinion that point sampled looks better. We can argue that the two images are not close enough in relative "correctness" to be directly compared in the first place.
So, what can we do with this? What's the best way to approach the comparison of two different implementations?
Two step process:
1) Come up with some quantifiable, 100% objective method to deterine "correctness." Terribly difficult to do, I know, and if it's done, I still don't see how a single number to "quantify correctness" is possible. But that aside, we then come up with some generally agreed threshold by which two implementations are agreed to be of "similar correctness." Very simplistically, say that they must be within 5% of each other in terms of the "correctness" score.
2) At that point, it is purely a subjective measure to determine which implementation is the most "pleasing", and therefore has the overall higest image quality.
In short:
1) Fundamentally, Image Correctness is quantifiable.
2) Fundamentally, Image Pleasantness is not quantifiable.
3) Overall Image quality has both components. The quantifiable component determines the degree to which the images can be compared. The subjective component determines the overall image quality of two images that are of comparable correctness.