Here are some points for discussion on Cg:
- Does not output PS 1.4
- Does output an "integer PS 2.0"
- Outputs code suited for nVidia's own low level shader/"assembly" optimizer
The first prevents all R200 and RV2x0 cards from exposing anything but PS 1.3. This precludes expanded dynamic range, flexibility, and shader length. nVidia has said they will deliver this, but to my knowledge they still have not, and I have a more than slight distrust of their intention to actually do so.
The second allows the nv30 to compete. The only problem I have with this, besides where it concerns the other two factors noted, is that it can cause development decisions to be made for performance on the nv3x that are not pertinent at all the R300 and future cards...basically independently forcing a non standardized least common denominator instead of the PS 2.0/ARB_fragment_program the industry agreed upon and we were expecting. Now, the real problem is not when this is done in addition (why screw over nVidia card owners if you don't have to?), but when this is done
instead of implementing to PS 2.0 specifications. EDIT: this is also depending on the final outcome of WHQL certification on how Cg interacts with the drivers when executing "PS 2.0", or when using OpenGL and considering ARB_fragment equivalent to "PS 2.0" and the nv30 extension "integer PS 2.0"
The third is a pervasive problem. For instance, the R300 performance is optimized for scalar/3 component vector execution with texture ops simultaneously dispatched, while this emphasis absolutely chokes the nv30, especially if floating point is needed. Under DX 9 HLSL, the output is optimized to the spec, which the IHVs can then anticipate and optimize for, but for Cg, the output is optimized to the nv30's special demands, and this can, and has, resulted in "LLSL" that is distinct. They say they aren't bypassing the API, but they do exactly that with regards to DX 9 HLSL.
It might be a simple matter for nVidia to effectively optimize for this resulting code (and they of course have a vested interest in doing so and control what the code looks like), but other IHVs would have to do new work to either optimize with the different Cg "standardized" profile output's high level optimizations (or lack of them) in mind, repeat the work that Microsoft already did and come up with their own optimized DX 9 "equivalent" HLSL compiler (if they've already optimized for DX 9 HLSL output, this doesn't make much sense unless you want to spend time and effort to specifically hand over control to your direct competitor
), and trust in nVidia's good intentions with regards to the future work for this they'd have to do. Seems a bit silly for nVidia to expect other IHVs to do that work for them as it suits their goals.
None of those options make sense to me, and I suspect to any IHV with an architecture dissimilar to the nv30's (which I'd guess is all other vendors, but it is conceivable that someone could have made similar choices) with nVidia's level of control over language evolution, so it seems to me that the burden for Cg to be useful as
other than as a tool to impede other IHVs, while easing things for nVidia, falls to nVidia themself. That is to say, they need to settle with benefiting themself instead of both benefiting themself and simultaneously maintaining a level of control that disadvantages other IHVs, as it is their own hardware that requires deviation from the standard (downwards) to perform well.
What could they do? Well, they
could make the Cg standard output identical to DX 9 HLSL shader output and support all its targets (i.e., effectively be the DX 9 HLSL compiler for anything except the nv3x cards within the Cg development environment), or they could encourage developers to code to DX 9 standard HLSL, and then promote Cg as the "nVidia specific" DX 9 HLSL to code to
in addition (this still seems a possible outcome, and hopefully they'll be forced to settle for this at most by the market without too many deviations in the interim).
Now, this is what they imply they are encouraging when Cg is challenged as undesirable, and what I hope market realities will force developers to do, but this does not fit what I see them as doing in actuality (lip service is cheap)...avenues of Cg promotion appear to be to foster it as replacement environment for developing instead of using DX 9 HLSL and its compiler, and I don't see any indication (yet) that they have plans to facilitate the use of the DX 9 HLSL compiler on their "identical" code from within their Cg development toolset. Seems trivial to implement (if they're really committed to maintaining the specification compatibility to DX 9 HLSL), so I'm concerned as to why I see only nVidia's compiler output mentioned when discussing DX targets from Cg.
There are options developers can use that aren't negative to everyone besides nVidia...like the option mentioned above for coding for DX 9 HLSL (the standard), and
then supporting Cg (even though it doesn't seem identical, adjustment should be trivial, though getting similar performance might be difficult sometimes for the current nv3x cards) for nVidia cards afterwards. If nVidia devrel is assisting in time used for the last part, I don't see any significant disadvantage to this. I do think it is incompatible with how I perceive developers are "supposed" to utilize Cg tools and interact with nVidia developer relation initiatives as actually implemented...for instance, a developer who decides on Cg based on proported equivalence to DX 9 HLSL is strongly dissuaded from exploring the possibilities of PS 1.4 shaders as an option, and will continue to be until nVidia provides an effective PS 1.4 solution (including recognizing competitors capabilities for it) or coerces other IHVs to provide one by their refusal to do so after success in gaining developer adoptation of Cg, and thereby facilitating the disadvantages for IHVs other than nVidia that situation entails
Also, there are other options (that benefit nVidia,
but not exclusively them) open to nVidia from their standpoint, like designing future hardware to offer performance when following API specifications (I've considered that the nv30 had broken floating point support in register combiners and thus might be fixed quickly, but this doesn't seem as likely at present), or actually achieving the things I propose above and meaning the things that they imply about Cg's "open" intent. This seems likely to me only if nVidia is forced to by the marketplace (with the R300 lead time, I hope this is the only avenue of success for Cg) or if Cg is still around when their hardware has improved to be able to compete well in standard API paths, which, unless the nv35 is "nv30 with the fix of floating point register combiners", I don't think is remotely possible any time soon.