As I understand it - thanks to Beyond3d and Extreme tech articles on 3d graphics and 3d Mark 03:
A shader throws 3d commands via an API (OpenGL or DX) to a graphics driver. The driver understands, optimises and then dispatches 3d assembler commands to the execution units of the GPU.
Optimisaton is normally application independent. What happens is that 1) all code is scanned for better ways to do an instruction in the command set of a GPU (e.g. why do a ADD, then a MUL if you can simply do a MAD) 2) instructions affecting independent data items may be re-ordered by a clever driver to avoid pipeline stalls 3) workload may be balanced against GPU execution unit queues to keep all execution units busy to the maximum to get the job done faster. Then after this optimisation instructions are dispatched.
Point 2 above is where ATi specifically intervened - giving their driver better than normal optimisation in its re-ordering optimisation capabilities because of the extent of their knowledge of the shader code. By identifying two shaders by name they were able to trigger human vs machine level optimisation at run time.
To me this is a gray area but I would prefer (in descending order of preference):
1) their drivers optimisation is made better for all shader code to optimise execution dispatch
2) users to be allowed to select human level intelligence shader optimisation or
3) machine level shader optimisation only or
4) no optimisation at all (to see how slow unoptimised code goes)
This is just like compiler theory. ATi slipped some of 2) into 3) above. In my mind that is a distortion that comes close to a cheat depending on what rules FM set but FM must rule on that.
All drivers do code optimisation, that is where major performance gains normally come from.
NVidia executed shaders that weren't what FM asked it to execute. Their drivers didn't really run the eight shaders in FM benchmarks than ran poor substitutes that gave fairly similar outputs - but skimped workload, colour precision and tidy up. NVidia have made no comment on this other than it costs alot to optimise drivers for our chip the way FM like to code DX9 shaders and they did it this way to sabotage us. Wow!!! FM's PS 1.4 and 2.0 shader code and fp precision are being argued on several web sites but it doesn't appear they were biased.
I fear NVidia shot above and below the mark for DX9 with NV3x cards. The need better PS1.4 and PS 2.0 capability and more seriously - lacking fp24 means in DX9 high colour precision means they have to default to fp32 which is seriously slow compared to their fp16 performance capabilities (it roughly halves the through put of their Pixel Shaders compared to their fp16 and scalar performance capabilities).
So NVidia is exceptional fast and well suited to DX 7.0 / 8.0 and 8.1 code, but in DX9.0 high colour precision for any large fragment of the screen causes them headaches. These two design constraints shown in FM may be why they are upset enough with FM to cheat in their tests. NV35 exceeds the requirements of DX9 - but not with speed. In some cases it can't meet the requirements of DX9 with speed.
So crazy but NVidia could do code level substitution in popular games in their drivers to speed games at the cost of image quality.
A shader throws 3d commands via an API (OpenGL or DX) to a graphics driver. The driver understands, optimises and then dispatches 3d assembler commands to the execution units of the GPU.
Optimisaton is normally application independent. What happens is that 1) all code is scanned for better ways to do an instruction in the command set of a GPU (e.g. why do a ADD, then a MUL if you can simply do a MAD) 2) instructions affecting independent data items may be re-ordered by a clever driver to avoid pipeline stalls 3) workload may be balanced against GPU execution unit queues to keep all execution units busy to the maximum to get the job done faster. Then after this optimisation instructions are dispatched.
Point 2 above is where ATi specifically intervened - giving their driver better than normal optimisation in its re-ordering optimisation capabilities because of the extent of their knowledge of the shader code. By identifying two shaders by name they were able to trigger human vs machine level optimisation at run time.
To me this is a gray area but I would prefer (in descending order of preference):
1) their drivers optimisation is made better for all shader code to optimise execution dispatch
2) users to be allowed to select human level intelligence shader optimisation or
3) machine level shader optimisation only or
4) no optimisation at all (to see how slow unoptimised code goes)
This is just like compiler theory. ATi slipped some of 2) into 3) above. In my mind that is a distortion that comes close to a cheat depending on what rules FM set but FM must rule on that.
All drivers do code optimisation, that is where major performance gains normally come from.
NVidia executed shaders that weren't what FM asked it to execute. Their drivers didn't really run the eight shaders in FM benchmarks than ran poor substitutes that gave fairly similar outputs - but skimped workload, colour precision and tidy up. NVidia have made no comment on this other than it costs alot to optimise drivers for our chip the way FM like to code DX9 shaders and they did it this way to sabotage us. Wow!!! FM's PS 1.4 and 2.0 shader code and fp precision are being argued on several web sites but it doesn't appear they were biased.
I fear NVidia shot above and below the mark for DX9 with NV3x cards. The need better PS1.4 and PS 2.0 capability and more seriously - lacking fp24 means in DX9 high colour precision means they have to default to fp32 which is seriously slow compared to their fp16 performance capabilities (it roughly halves the through put of their Pixel Shaders compared to their fp16 and scalar performance capabilities).
So NVidia is exceptional fast and well suited to DX 7.0 / 8.0 and 8.1 code, but in DX9.0 high colour precision for any large fragment of the screen causes them headaches. These two design constraints shown in FM may be why they are upset enough with FM to cheat in their tests. NV35 exceeds the requirements of DX9 - but not with speed. In some cases it can't meet the requirements of DX9 with speed.
So crazy but NVidia could do code level substitution in popular games in their drivers to speed games at the cost of image quality.