Jawed said:The G70's two primary ALUs per fragment pipe can both do a MUL and/or ADD:
- r0=a*b or
- r0=a+b or
- r0=a*b+c
- r1=d*e or
- r1=d+e or
- r1=d*e+f
Here one of three different calculations for r0 can be calculated by ALU 1. Similarly, one of three calculations can be calculated by ALU 2, and the result assigned to r1.
Whereas you can see the limited options for NV40 which can only do a MUL and a MUL and/or ADD, in total:
- r0=a*b
- r1=d*e or
- r1=d+e or
- r1=d*e+f
(These examples only work if the source data is in FP16 format.)
It seems to me that G70's much easier for the run-time compiler to work with, and that's the reason why shader replacements aren't needed.
But, well, I expect we'll never find out for sure.
Jawed
_xxx_ said:switch(CHIP):
case NV40: {<assemble this way>}; break;
case G70: {<assemble that way>}; break;
case ...
default: {<do compatibility mode, works with every chip>};
}
Just from the top of my head, I don't see that being especially hard to implement in the drivers.
I'm actually less than impressed with NVidia's compiler technology. I'd think that after developing Cg they'd be ahead of ATI, not behind.trinibwoy said:Apologies if this is a dumb question but I was wondering - are the ALU changes on G70 just drop-ins that would work fine with NV40's compiler or would there be significant changes to take full advantage of the new instructions?
Mintmaster said:I'm actually less than impressed with NVidia's compiler technology. I'd think that after developing Cg they'd be ahead of ATI, not behind.
So an unoptimised shader needs to fit under the instruction count even if you know that the optimised version would come in under the limit?I actually tried that flag, but unfortunately it bumped me over the instruction count several times.
Humus said:Mintmaster said:I'm actually less than impressed with NVidia's compiler technology. I'd think that after developing Cg they'd be ahead of ATI, not behind.
There's kind of an inside joke at ATI that we should start promoting the use of Cg. The reason is that Microsoft's HLSL compiler does quite a lot of optimizations, which unfortunately makes our driver's compiler's work a lot harder. Cg on the other hand returns more or less unoptimized code. This usually works better with our compiler, so it often comes out ahead in the end. In fact, in many cases using the D3DXSHADER_SKIPOPTIMIZATION flag to the MS compiler gives you better performance.
DemoCoder said:_xxx_ said:switch(CHIP):
case NV40: {<assemble this way>}; break;
case G70: {<assemble that way>}; break;
case ...
default: {<do compatibility mode, works with every chip>};
}
Just from the top of my head, I don't see that being especially hard to implement in the drivers.
You're not considering the difficulty of the <assemble this way> part. (assemble would be a misnomer, the driver performs compilation, not simple assembly)
One thing is sure..Sony is doing research for a shaders compiler for an unreleased GPU (well..it's released now.. ):ondaedg said:Perhaps, once developers start working with CG for the PS3, you may see more developers using it as opposed to HLSL.
What are the many advantages of Cg (besides OpenGL portability)? MS put a lot of effort into shader debugging, fragment linking, compiling, etc.Razor1 said:This is very interesting, I've noticed this too, didn't know the cause of it till now, but unfortunately not many developrs use Cg enough, which really sux since its has so many advantages.