Is NV40's free fp16 normalization enabled?

991060

Regular
After hearing humus's tweak, I was thinking there might be some way to improve NV40's performance also. I checked the shader file in doom3, found that there're many dp3/rsq/mul sequences to do normalization. Then I used "NV_fragment_program2" option to rewrite some of them and replace all such sequences by a nrm instruction. To my surprise, there's ZERO improvement in demo1 and a customized demo I recorded myself. I have to ask is it due to the free fp16 normalization isn't enabled, or the current driver is already awared of the dp3/rsq/mul sequence and did the replacement under the table. :?:

edit: I was using H(half) suffix in the instructions, and decelared all temp registers as short, so it's fp16 for sure.
 
When I looked at it in May/June it was enabled in D3D. The nrm_pp was independant but had a 2 cycles latency.
 
Thanks, looks like I need to write a OpenGL fillrate tester. :D

Since I'm newbie to OpenGL, can anyone tell me how to get rid of v-sync in OpenGL? I knew those samples from NVSDK 8.0 don't have this problem, too lazy to check the code. :p
 
I'm not sure how shader limited Doom3 is....

I think Humus's tweak gives it's major improvement by aleviating the pressure on ATI's Texture cache rather than anything else. Removing the one texture fetch could have dramatic speed implications on the shader if it's the difference between thrashing the cache and not thrashing. Fixing up some ALU ops would by comparison be a minor change.

A pretty good test if you wanted to find out is to Null out the shader (draw everything red or something) then put some none trivial code that was a similar instruction count but with no textures, then add the textures back inm one at a time.

It's also possible that Nvidia already spot these instruction sequences in the driver (it's an obvious optimisation).
 
991060 said:
Thanks, looks like I need to write a OpenGL fillrate tester. :D

Since I'm newbie to OpenGL, can anyone tell me how to get rid of v-sync in OpenGL? I knew those samples from NVSDK 8.0 don't have this problem, too lazy to check the code. :p

If you are using Windows, use this.
 
Back
Top