All the details have been covered elsewhere. Let me try to cover them all in one place.
First, let me note Shadermark is Direct X, not OpenGL My analysis of the shadermark performance would be based on the fact that the performance is
LESS than half of R300 performance...to me this indicates directly an opportunity for driver efficiency optimization (i.e., relatively trivial given the gross performance deficiency, and likely fixed very soon).
The following comments relate to OpenGL specifically.
---
Carmack: GFFX:NV30 fragment path -> slightly ahead of R300:ARB2 fragment path, with R300 sometimes leading.
Note the path the least likely to benefit from driver optimizations: the one based on the extensions nVidia specified, which are likely to be closely mapped to hardware functionality. This indicates what seems like a reasonable performance ceiling with fully optimized precision and functionality implementation for the "ARB2" path.
Carmack: GFFX:ARB2 fragment path -> half the speed of GFFX:NV30 fragment path, with Carmack specifically stating the problem being due to performance due to higher precision in the ARB2 path.
This indicates where "ARB2" path performance is now with regards to fragment shading.
So, why think progress from ARB2->NV30 for the NV30 might be possible?
Mentioned elsewhere in thread: ARB2 has precision hint specification allowing specifying either "maximum precision" or "maximum performance" (nicest/fastest), and it is up to the drivers to take the "maximum performance" hint and effectively decide where precision can be sacrificed.
But what kind of optimizations might be used for the ARB2 path with the "fastest" hint?
[url=http://www.beyond3d.com/previews/nvidia/nv30launch/index.php?p=2 said:
Interview 1[/url]]
There was talk that FP16 (64-bit floating point rendering) could run twice the speed of FP32 (128-bit floating point rendering), is that the case?
Yes it is.
...
I assume that anything available currently using the the 32-bit format will be run in FP16 mode?
Actually, no. We have native support for 32-bit integer, which is how we get the performance on the older apps. If we were to run them as FP16 then they wouldn't run as fast. So we have dedicated hardware with native support for 32-bit per pixel integer, 64-bit per pixel floating and 128-bit per pixel floating.
Also, another factor in the "NV30" code path's performance that either might not be able to be reflected in the "ARB2" code path at current and might provide opportunities for future performance enhancement depending on how many assumptions nVidia can safely make:
[url=http://www.beyond3d.com/articles/nv30r300/index.php?p=9 said:
R300 versus NV30 on paper[/url]]
Here we can find that register combiner unit is still available, even in fragment program mode, because it is commonly used and provides a powerful blending model. For example, it allows for four operands, fast 1-x operations, separate operations on color and alpha components, and more. These operations could be performed by fragment programs, but would require multiple instructions and program parameter constants. Supporting both methods simultaneously allows a programmer to write a program to obtain texture colors and then use the combiners to obtain a final fragment color.
As such, there are two different types of fragment programs: one "normal" and one for combiners. For combiner programs, the texture colors 0 through 3 are taken from texture output registers 0 through 3, respectively. The other combiner registers are not modified in fragment program mode.
Can we tell where the performance will end up?
Nope, or atleast
I can't. Likely Carmack could, but he didn't choose to speculate and quoted nVidia's assurances. I can only guess that the ceiling is "NV30" fragment shading path performance.
...
Though the last quote is "paper" analysis, I think overall it can be seen that in regards to the "ARB2" path there is very good reason to believe there is room for optimization based on floating point precision handling in future drivers. It also seems safe to assume that the performance of the "NV30" path seems to be a very good indication of the ceiling such enhancement would offer, and that the guessing game nVidia might play to achieve that is not likely to absolutely match it in the general case (and as long as the "NV30" path is there, game specific optimization for the "ARB2" path seems a waste of time).
I hope providing substantion can end the comparisons of these discussions to shakespearian analysis
....I tried to pick statements that are direct, easy to understand, and informative, with little speculation left.
I also hope this is presented clearly enough so as to not cloud the issue.