Cg Profile Performance - a brief article.

pocketmoon_

Newcomer
Hi,

I've written up some Cg testing I've been playing around with for a week or so. It's a comparison of how the various Cg profiles perform (OpenGL and DX) with a small set of shaders. There is also a comparison with Microsofts HLSL.

Please let me know if you spot anything dumb.

http://www.pocketmoon.com/Cg/Cg.html

Cheers,

Rob
 
Can you just clarify what the NV30, NV30-PP and NV30-FIXED modes are referring to exactly? me=thick
 
Looking at your "help the compiler" section, it looks like the Cg compiler back ends have a long way to go.
 
Neeyik said:
Can you just clarify what the NV30, NV30-PP and NV30-FIXED modes are referring to exactly? me=thick

Sorry - I'll make it clearer in the article.

The FP30 profiles supports floats, halves, and fixed point data types. I simply wrote three versions of each shader to make tge best use of each data type. The Cg code show for each shader is the 'float' version. The other version simply use the other data types. Compiling these other versions using, say, the ARBFP profile simp,y results in the lower precision datatypes being up-cast to floats so there is no performance gain to be had there.
 
Interesting test and at least Cg shines in what it is supposed to, namely to write to the metal based on a profile (FP30). ;)

And it just loves the fixed precision (register combiners) and dislikes the ARB FP. This isn't really a surprise of course (think Carmack), but I wonder whether this mainly have to do with the Quadro FX 2000 or Cg?
 
RussSchultz said:
Looking at your "help the compiler" section, it looks like the Cg compiler back ends have a long way to go.

To be fair, I'll do a quick test later and see what the HLSL compiler makes of it. The HLSL comparison was bolten on after I'd done the initial shader tuning.
 
Illustrating the actual instruction output differences between the Cg and DX 9 HLSL shaders would be interesting, via a link, as it could make the results a lot more meaningful. I'm also interested in what the low instruction count differences consist of.

Also, for the performance difference examples after the "Intrinsics" heading, I'm curious if there is a difference in the relative performance for the approaches between full precision and for half precision, and between Cg and DX 9 HLSL. I'm wondering if it is a universal missed optimization opportunity or not, and how it interacts with the various targets, and the suggestion above for showing the various outputs could clarify why the behavior occurs for the nv30.

Highlighting the applicable output instructions in contrast for your specific performance difference examples would be helpful as well.

Might also be handy to see how CPU scaling affected some benchmarks, since the issue for performance for some might be driver scheduling workload.
 
Sorry for stating the obvious, but would it be possible to get 9700 numbers up there just for comparison? I'd like to see what the 9700 is capable of in ARBFP and PS_2_0.

We'd also have an idea of how much work is left to be done in the shader portion of the NVidia drivers, and if their integer pipeline is any faster than ATI's float pipe.

I'm sure someone here can help you out if you don't have the hardware.

BTW, good job! I'm sure there were a number of us itching to see these results.
 
Mintmaster said:
I'm sure someone here can help you out if you don't have the hardware..

Sure, I'd be happy to send the shaders and the VC++ projects to anyone with a 9700 who'd be happy to benchmark for me. Or someone can send me a 9700 :)
 
pocketmoon_ said:
Mintmaster said:
I'm sure someone here can help you out if you don't have the hardware..

Sure, I'd be happy to send the shaders and the VC++ projects to anyone with a 9700 who'd be happy to benchmark for me. Or someone can send me a 9700 :)

i could bench it for you (as long as i dont have to go through editing your c++ project :D )
 
Well, the preliminary results make it look like Cg just isn't there yet, performance-wise. nVidia has a lot of work to do.
 
Chalnoth said:
Well, the preliminary results make it look like Cg just isn't there yet, performance-wise. nVidia has a lot of work to do.

Eh? I thought it performed OK. It's close to matching Microsofts HLSL in producing shaders for Microsofts DX9 PS2x. There's one case that looks like a lack of 'conditional' optimisation.

Nvidia have probably spent time working to their strengths, the NV30 profile results show how flexible the architecture is. It's a shame that the microsoft shader standards lack fixed data types and that you have to use OpenGL to get the best out of NV30.
 
pocketmoon_ said:
It's a shame that the microsoft shader standards lack fixed data types and that you have to use OpenGL to get the best out of NV30.

Alternatively, it is a shame that the NV30's shader execution model is so different from that underlying the DX9 and ARB standards.
 
Chalnoth said:
Well, the preliminary results make it look like Cg just isn't there yet, performance-wise. nVidia has a lot of work to do.

Sorry to quote you again ;)

I think I will have to greatly expand the range of shaders on test. I came up with a 'new improved' median filter that derives the median by swap-sorting the 5 samples and Cg *loves* it (PS2x 30fps) and HLSL *hates* it (8fps).

The FP30 profiles using fixed point datatypes runs at 90fps. You can see where DOOM3 is going to get it's speed from!
 
pocketmoon_ said:
Chalnoth said:
Well, the preliminary results make it look like Cg just isn't there yet, performance-wise. nVidia has a lot of work to do.

Eh? I thought it performed OK. It's close to matching Microsofts HLSL in producing shaders for Microsofts DX9 PS2x. There's one case that looks like a lack of 'conditional' optimisation.

Nvidia have probably spent time working to their strengths, the NV30 profile results show how flexible the architecture is. It's a shame that the microsoft shader standards lack fixed data types and that you have to use OpenGL to get the best out of NV30.
I thought Cg doesn't perform any optimizations on the output.
 
pocketmoon_ said:
Eh? I thought it performed OK. It's close to matching Microsofts HLSL in producing shaders for Microsofts DX9 PS2x. There's one case that looks like a lack of 'conditional' optimisation.

Nvidia have probably spent time working to their strengths, the NV30 profile results show how flexible the architecture is. It's a shame that the microsoft shader standards lack fixed data types and that you have to use OpenGL to get the best out of NV30.
It would be interesting to see some more varied results, but from those benches, it really is looking like the Cg PS2x profile is performing a fair bit below HLSL.

Personally, I don't care that much about how Cg performs with the NV30 profiles. The really important thing is that it performs well with the standard profiles. That's the only way it can truly be a useful design tool. The performance on hardware-specific profiles is just a side bonus.
 
Reverend said:
I thought Cg doesn't perform any optimizations on the output.
The original beta release didn't. At least some amount of optimization has since been implemented in the compiler.
 
Chalnoth said:
It would be interesting to see some more varied results, but from those benches, it really is looking like the Cg PS2x profile is performing a fair bit below HLSL.

Well I'll be updating the list with some new shaders, and as I posted earlier there's at least one when HLSL gets out hammered by Cg in PS2_X.

Personally, I don't care that much about how Cg performs with the NV30 profiles. The really important thing is that it performs well with the standard profiles.

I half agree :) (agree_pp!) The benefit of Cg is that if you write a shader you can compile it to whatever profile your hardware supports. If the Cg compiler improves (and it will do) then why use HLSL, which wont get the best out of some hardware and isn't cross platform ?
 
Back
Top