Chalnoth said:
CPU's have always been about picking the right precision for the job. They support multiple precisions for a very good reason, and that reason is that sometimes it is better to sacrifice precision for speed, because that sacrifice in precision will mean nothing for the final output. This is particularly the case in 3D graphics, where the final output will be, at most, 12-bit integer (currently the highest in PC 3D is 10-bit, but most still output at 8-bit).
I know that I've pointed out before that I believe that you are looking at the wrong architectural example, so I'll do it again -
My questions here are simple :
- Why do you (and others) regularly insist on picking CPU architectures as providing the example of what VPUs should or should not do?
- What similarities in design do you see between VPUs and CPUs that leads you to believe that this is an appropriate or valid argument?
- Why do you not pick DSPs as the architectural precedent, or dedicated high-speed SIMD/vector processors such as Cray? How do you see VPUs as being more similar to CPUs than to these architectures.
Personally, I think that FP16 will probably be enough accuracy for the first generation of DX9 games as it is doubtful that these will make too much use of complex shaders which might require the additional accuracy of FP32/FP24. I can therefore understand why FP16 could possibly have been useful as part of the DX9 spec.
One also has to recognize that not all calculations will exacerbate the errors. Some calculations will tend to hide them, by their very nature. Just because a shader is complex doesn't necessarily mean that it will require much higher accuracy than the final output. It all depends on what calculations are done, and what kind of data those calculations are done on.
And just because a shader is simple doesn't necessarily mean that it automatically requires low accuracy. I can have a program that contains one arithmetic instruction, one dependent texture read (wow - 2 whole instructions in length) and display the results with an 8-bit per channel screen depth and display serious calculation inaccuracies.
The only question that should be asked is, for most 3D graphics programs, will it be better for hardware to support integer calcs (or any given precision) explicitly? Or will more performance be obtained if those transistors were instead used to improve performance for a higher-precision format?
That would depend if you are trying to accelerate a specific performance case (low precision), or the most generally applicable performance case (high precision).
Do the programs you describe need the specific case, or the general one? Legacy apps, designed around low precision, will only require the specific low precision case. Future applications, designed with higher requirements in mind may perhaps need the general case more than a specific one, so perhaps a forward-looking design should target this case?
That is the sort of design decision that has to be made by ATI, nVidia and anyone else making 3D graphics architectures.
I'll make it simple. If INT12 had been supported, nVidia wouldn't be inclined to force the use of lower precisions. The way it is now, DirectX 9 is ensuring lower-quality rendering on the NV30-34 processors, as nVidia must use auto-detection to make use of the significant integer processing power. If INT12 was supported in the API, games could both perform higher and look better on these video cards.
So what you're saying is that if everyone kow-towed to nVidia and made things the way they dictate then amazingly they wouldn't feel pressured in the market by competitors coming up with superior implementations? If they have to creatively interpret the specifications it's because they operate in an evil market that allows free competition? How inconvenient that must be for them.
Why should everyone else in the market have to kow-tow to nVidia's 'vision' whether it's superior or not?
And I'll say it one last time. Stating that INT12 or FP16 are just bad for 3D graphics is an arbitrary judgement. Whether or not they are useful depends on the algorithm. Both formats are still higher in quality than the final output, so obviously there will be a number of calculations that will not benefit from higher precisions.
Yes, this is true. There will be a number of calculations that will not benefit from higher precision. It also appears that this 'vision' of freely mixing precisions does not automatically make you faster than a processor designed specifically to accelerate the most general case of high precision.
Surely if freely mixing precisions is such a great feature then an architecture that can do so should be winning on all legacy apps (where it can use whatever precision it likes) by some huge margin? Why is this not the case?
How do you go about botching such a 'superior' vision to an extent such that even when mixing precisions freely you still can't necessarily match the performance of another architecture that is designed simply to accelerate the most general high-precision case? Even worse, perhaps it turns out that even running at higher clock rates (in some cases much higher) you cannot necessarily make up the deficit?
Maybe it's just not necessarily a superior vision after all.