Uttar said:
The NV30 is superior to the R300 if the following is true:
What is "superior"? Offering more features, or more speed? It doesn't seem to do both at the same time.
1. Both INT & FP are used in the same program
2. Few registers are used
3. There's little scalar
1. It is superior if you remove dynamic range and precision data during operations? Isn't that a bit contradictory to the label "superior" and to proposing shader length advantage?
2. With "few registers used", you are either ignoring integer processing (which reintroduces the performance issues and eliminates the assertion that it competes well in speed), or are just repeating 1.
3. "little scalar" precludes claiming quality advantage without drastic performance deficit, and doesn't facilitate claiming performance advantage without the above items being included as well.
It's not THAT hard, now, is it?
You usually make sense, Uttar, but I don't see it here. "not THAT hard" after listing a set of criteria that seems to contradict the premise of "the NV30 is superior to the R300"?
2 is Cg's job.
3 is true in many cases.
It is? What are these cases? Again, I see it ranging from being "ps 1.3" functional and faster, or "ps 2.0 extended" and much slower. Note that the extended features keep being proposed as being significant, but are proposed without discussing R300 instruction and functionality advantages.
1 is the big problem, because DX9 doesn't support INT.
It is also a problem because it offers inferiority, both in tangible and in theoretical results, to the R300. 3dmark 03, shader benchmarks, and John Carmack's discussions all seem to support this (a 500 MHz part using a custom tailored path at reduced quality barely edging out a 325 MHz part using a generic path does not establish superiority).
Where did this jump from "nv30 is competitive when using integer" to "nv30 is superior" come from all of a sudden? The support seems to be predicated on a theoretical situation and ignoring factors outside of that case.
Not even through extensions. I'm sure nVidia would gladly pay them $25M "under the table", or maybe even more, to get full integer support in DX9.1 extensions and the right to use FP16 registers for most operations...
Using fp16, it is still slower than the R300. If it is using intermixed integer ops, and thereby freely dropping the advantages of fp16 in intermixing ops, that performance parity can indeed be somewhat addressed in a realistic workload. However, PS 1.3 functionality is not clearly superior to PS 2.0 at fp24 with a performance lead, nor is "extended" functionality with integer precision (what we just mentioned) clearly superior to PS 2.0 at fp24 with intermittent performance parity, nor "extended functionality" with fp16 precision compared to PS 2.0 at fp24 and significantly slower performance. And all these situations are with a significant clock speed advantage.
It looks to me like a series of tradeoffs where a good case can also be made for the nv30 being inferior, and these are the
best case situations for what you seem to be trying to propose as "superior".
Well, unless you are disregarding performance completely and want to discuss fp32 alone (though that seems odd in a discussion involving the performance offered by the integer pipeline), though that does seem to leave the door open for CPUs to compete.