Very informative tool about CineFX architecture.
http://developer.nvidia.com/object/fx_composer_home.html
For example, it seems that NV35/NV36/NV38's combiners only support fp16 multiply-add.
No fp32 support.
Sample source code:
PixelShader = asm
{
ps_2_x
dcl t0
dcl t1
dcl_2d s0
dcl_2d s1
def c0, 0.1, 0.2, 0.3, 0.4
texld r0, t0, s0 // tex unit
texld r1, t1, s1 // tex unit
mad_pp r0, r0, r1, c0 // combiner stage1(fp16) or shader core(fp32)
mad_pp r0, r0, r1, c0 // combiner stage2(fp16) or shader core(fp32)
mov oC0, r0 // will remove by unified compiler
};
Shader Perf Panel results:
****************************************
Target: GeForceFX 5900 (NV35) :: Unified Compiler: v56.58
Cycles: 1 :: # R Registers: 2
GPU Utilization: 100.00%
****************************************
Target: GeForceFX 5700 (NV36) :: Unified Compiler: v56.58
Cycles: 1 :: # R Registers: 2
GPU Utilization: 100.00%
****************************************
If I change "mad_pp"(fp16) to "mad"(fp32).
Shader Perf Panel results:
****************************************
Target: GeForceFX 5900 (NV35) :: Unified Compiler: v56.58
Cycles: 3 :: # R Registers: 2
GPU Utilization: 50.00%
****************************************
Target: GeForceFX 5700 (NV36) :: Unified Compiler: v56.58
Cycles: 3 :: # R Registers: 2
GPU Utilization: 50.00%
****************************************
It seems that mad instruction allocate to shadre core, so need 3cycle.
(GPU Utilization 50.00% because of no-utilization of combiners, perhaps)
http://developer.nvidia.com/object/fx_composer_home.html
For example, it seems that NV35/NV36/NV38's combiners only support fp16 multiply-add.
No fp32 support.
Sample source code:
PixelShader = asm
{
ps_2_x
dcl t0
dcl t1
dcl_2d s0
dcl_2d s1
def c0, 0.1, 0.2, 0.3, 0.4
texld r0, t0, s0 // tex unit
texld r1, t1, s1 // tex unit
mad_pp r0, r0, r1, c0 // combiner stage1(fp16) or shader core(fp32)
mad_pp r0, r0, r1, c0 // combiner stage2(fp16) or shader core(fp32)
mov oC0, r0 // will remove by unified compiler
};
Shader Perf Panel results:
****************************************
Target: GeForceFX 5900 (NV35) :: Unified Compiler: v56.58
Cycles: 1 :: # R Registers: 2
GPU Utilization: 100.00%
****************************************
Target: GeForceFX 5700 (NV36) :: Unified Compiler: v56.58
Cycles: 1 :: # R Registers: 2
GPU Utilization: 100.00%
****************************************
If I change "mad_pp"(fp16) to "mad"(fp32).
Shader Perf Panel results:
****************************************
Target: GeForceFX 5900 (NV35) :: Unified Compiler: v56.58
Cycles: 3 :: # R Registers: 2
GPU Utilization: 50.00%
****************************************
Target: GeForceFX 5700 (NV36) :: Unified Compiler: v56.58
Cycles: 3 :: # R Registers: 2
GPU Utilization: 50.00%
****************************************
It seems that mad instruction allocate to shadre core, so need 3cycle.
(GPU Utilization 50.00% because of no-utilization of combiners, perhaps)