Is there any evidence to show NV40 can handle PS3.0 Well?

I percieve the three major capabilities of SM 3.0 hardware over 2.0 as:

i) elegant and effective displacement mapping in Vertex Shader 3.0 (and this NV40 appears to do very well)
ii) branching in Pixel Shader 3.0 (possibly reducing memory overheads - loading shaders and constants) and
iii) the speed and resources to run longer shaders (which appears to be there to some degree).

I expect a few synthetic tests will show us how well dynamic branching in PS 3.0 turns out. Personally I think it will be acceptable in at least a few situations, given NV40 has the actual grunt to run more and/or longer shaders. If NV40 has the grunt to do 2x - 3x the shader load of NV35 you have some room to manouver.

I see PS 3.0 allows the programmer the significant convience of writting more general shaders, such as handling multiple lights in one shader with branching for each light source.

I don't expect it to be a panacea for the world's problems - but I do expect that the combination of having the speed and resources to run longer shaders combined with the generality that can be derived from branching and conditionals will be a very positive benefit in the future.

I would like to see speed tests of current vs next generation cards on a full load of short vs long shaders. Then I'd like to see tests running one reasonable length, complex shader with 8 branches / conditionals for 8 lights compared and constrasted with running the nearest equivalent set of 8 shorter, individual shaders for each light source. This could be a PS 3.0 vs PS 3.0 or PS 3.0 vs PS 2.0/ PS 2.+ test.

I would hope to see an almost neutral performance different on running a shader with say 20 instructions, looped eight times vs running eight shaders of 20 instructions each. To me you have performed around 160 instructions either way. So then you need to examine how if at all this above scenario affects the efficency of how you move data around your hardware and whether you can achieve high and beneficial utilisation levels on your hardware. If the branching models means less work loading both shader code and constants, meaning you might see a very nice performance gain!
 
g__day said:
I percieve the three major capabilities of SM 3.0 over 2.0 as i) elegant and effective displacement mapping in Vertex Shader 3.0 (and this NV40 appears to do very well) ii) branching in Pixel Shader 3.0 and iii) the speed and resources to run longer shaders (which appears to be there to some degree).

Well PS3.0 does nothing for the speed of longer shaders but the extra resources allows for the extra speed.

I would like to see speed tests of current vs next generation cards on a full load of short vs long shaders. Then I'd like to see tests runing a long, complex shader with 8 branches and conditionals for 8 lights compared and constrasted with running the nearest equivalent 8 shorter individual shaders for each light source. This could ba a PS 3.0 vs PS 3.0 or PS 3.0 vs PS 2.0/ PS 2.+ test.

If we are using FP16/32 storage textures the PS2.0 code will be FAR FAR FAR slower because of course of the extra mem speed over head this is where PS3.0 really helps with speed. ( Well the effects will be lessened on the NV40 if you use a FP16 framebuffer )
 
g__day said:
I percieve the three major capabilities of SM 3.0 hardware over 2.0 as:

i) elegant and effective displacement mapping in Vertex Shader 3.0
Only as long as you're happy with point sampling!
 
Actually, in alot of cases, you want point sampling.

Also, due to the fact that you can't do tesselation in HW, you will most likely pre-tesselate your mesh (usually a quad), and align it on texel boundaries with the DM.

This yields effective ways to do water, cloth, et al simulation.

Interpolation in the vertex textures would make a lot more sense if you had tesselation as well.
 
Back
Top