About that HLSL code
Is the failure when I attempt a "ps_2_a" target the result of fxc.exe hard coding, or is it just passing through what the 9.0a HLSL compiler library is capable of? In any case, I'm interested in MS's response in HLSL implementation to the NV3x challenge (dependency on depracating part of the underlying spec, in this case the number of registers supported). IOW, I'm wondering whether the "ps_2_a" target increases instruction count to save register usage, and, more importantly, how the ps_2_a performance compares to Cg, whatever it is doing...and hoping someone can help provide the answer, maybe by running fxc on the HLSL code above with the _2_a profile, and listing the beta version/date and code output.
However, what's a question without some of my associated off-the-wall speculation?...:
Another question I have is whether the beta 2_a profile is actually going to be released...it seems like nVidia is pushing for Cg pretty exclusively to (maybe) allow precision spec circumvention (wrt to DX 9) under their control, and a good DX9 HLSL ps_2_a profile, from the NV35's point of view atleast, would tend to undercut that effort. If nVidia isn't pushing for it, I'm not sure Microsoft will be prompt with delivering it with the small amount of cards showing significant benefit ...cards <5900 don't fit the rest of the ps_2_a profile characteristics outlined in the GDC slides all that well because of precision issues, so I suspect without nVidia working on their end it might not be productive effort on MS's part. Then again, maybe something will crop up with this "real" DetFX release that is supposed to occur sometime in the future....barring that, the NV40's overcoming the NV3x weaknesses might be foreshadowed in the next few months by how nVidia changes their developer focus, including a possible change regarding DX 9 standards efforts.
Oh well, atleast my first question is pretty straightforward.
Is the failure when I attempt a "ps_2_a" target the result of fxc.exe hard coding, or is it just passing through what the 9.0a HLSL compiler library is capable of? In any case, I'm interested in MS's response in HLSL implementation to the NV3x challenge (dependency on depracating part of the underlying spec, in this case the number of registers supported). IOW, I'm wondering whether the "ps_2_a" target increases instruction count to save register usage, and, more importantly, how the ps_2_a performance compares to Cg, whatever it is doing...and hoping someone can help provide the answer, maybe by running fxc on the HLSL code above with the _2_a profile, and listing the beta version/date and code output.
However, what's a question without some of my associated off-the-wall speculation?...:
Another question I have is whether the beta 2_a profile is actually going to be released...it seems like nVidia is pushing for Cg pretty exclusively to (maybe) allow precision spec circumvention (wrt to DX 9) under their control, and a good DX9 HLSL ps_2_a profile, from the NV35's point of view atleast, would tend to undercut that effort. If nVidia isn't pushing for it, I'm not sure Microsoft will be prompt with delivering it with the small amount of cards showing significant benefit ...cards <5900 don't fit the rest of the ps_2_a profile characteristics outlined in the GDC slides all that well because of precision issues, so I suspect without nVidia working on their end it might not be productive effort on MS's part. Then again, maybe something will crop up with this "real" DetFX release that is supposed to occur sometime in the future....barring that, the NV40's overcoming the NV3x weaknesses might be foreshadowed in the next few months by how nVidia changes their developer focus, including a possible change regarding DX 9 standards efforts.
Oh well, atleast my first question is pretty straightforward.