Explain To Me The Benefits of SM 3.0 For Nvidia

andypski said:
The DXT5 version will consume an extra slot in SM2.0 unless you have the appropriate swizzle in hardware. Unless you have the 'arbitrary swizzle' cap bit set then you will have to add an extra instruction to your shader that may or may not be optimised out depending on the underlying hardware.

The swizzle required is .wzyx which does not require arbitrary swizzle, it's the base component reversal swizzle which is mandatory in PS2.0. I just don't find this "we don't need a swizzle" argument very compelling from a development perspective. Now, if 3Dc had automatic computation of the Z or "free" normalization on texture lookup, you might have a compelling argument. But the swizzle is beyond trivial and sounds like grasping for straws.

3Dc has an advantage in IQ. As far as any development advantages for coding productivity, it is vanishingly small.
 
DemoCoder said:
andypski said:
The DXT5 version will consume an extra slot in SM2.0 unless you have the appropriate swizzle in hardware. Unless you have the 'arbitrary swizzle' cap bit set then you will have to add an extra instruction to your shader that may or may not be optimised out depending on the underlying hardware.
The swizzle required is .wzyx which does not require arbitrary swizzle, it's the base component reversal swizzle which is mandatory in PS2.0. I just don't find this "we don't need a swizzle" argument very compelling from a development perspective. Now, if 3Dc had automatic computation of the Z or "free" normalization on texture lookup, you might have a compelling argument. But the swizzle is beyond trivial and sounds like grasping for straws.
If you want to use the G component for extra precision, then you want a .wyXX swizzle (I say XX because the other two components are irrelevant). As far as I know, none of the .wyXX swizzles fall under the standard PS 2.0 swizzles.
3Dc has an advantage in IQ. As far as any development advantages for coding productivity, it is vanishingly small.
You're right... when you use the wrong swizzle ;)
 
Evildeus said:
Humus said:
I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.
Could you show us a pic comparing both? :)

3DcDXT5compare2.jpg
 
DemoCoder said:
The swizzle required is .wzyx which does not require arbitrary swizzle, it's the base component reversal swizzle which is mandatory in PS2.0. I just don't find this "we don't need a swizzle" argument very compelling from a development perspective. Now, if 3Dc had automatic computation of the Z or "free" normalization on texture lookup, you might have a compelling argument. But the swizzle is beyond trivial and sounds like grasping for straws.
3Dc has an advantage in IQ. As far as any development advantages for coding productivity, it is vanishingly small.
As OpenGL Guy pointed out for the two component case you want to use the G component of the colour block since this has higher precision than the R and B components. I guess it would certainly be possible to preswizzle your source information in the toolchain such that normal maps are in the appropriate component order, having Y as the derived axis (along with altering code so that all vectors are in the same coordinate system), but I expect this is a bit of a pain. In addition I think that the swizzles on the dp2add then might become screwed up so you might need to derive a different way - I'll check.

You can use DXT5 in a three-component mode and store all components explicitly without derivation to avoid this problem but the three component mode may produce lower quality because the compressor typically has more problems doing a good job with a two component colour block than a single component one.

Note that I also don't think that the extra swizzle cost is particularly significant in development time or effort to support. I was just pointing out that I thought you were wrong in both your initial statement :

Humus, Isn't that a bit disingenous? 3Dc requires adding additional shader instructions to compute the Z coordinate so it can not "fit right into the same shader" any easier than the DXT5 method.
... since the 3DC version certainly does not require any extra instructions, whereas the DXT5 method might.

And also in your other statement here:
ATI seems to be trying to sell the idea that games should only support non-3Dc low-res maps and hi-res 3Dc maps, in an attempt to kill off one of their own ideas
Which quite frankly is so far away from the truth of anything that we have been trying to do with our research and assistance to developers regarding normal map compression that I have to work hard not to find it personally insulting, although I'm sure you didn't intend that to be the case.
 
OpenGL guy said:
You're right... when you use the wrong swizzle ;)

Talk to Humus, his demo uses Z. :)

In any case, it's still not a big selling point. As I said, I'll buy the IQ argument, but the swizzles are not imposing any real work. One #define, or user function, and it's taken care of. 3Dc is a minor step forward, it is not a dramatic or significant step forward IMHO.

I understand the need to evangelize it to get developer adoption and I support developers using both 3Dc and DXT5 (I do not advocate against it), but too much hype sets off my PR detector.
 
DemoCoder said:
OpenGL guy said:
You're right... when you use the wrong swizzle ;)

Talk to Humus, his demo uses Z. :)

In any case, it's still not a big selling point. As I said, I'll buy the IQ argument, but the swizzles are not imposing any real work. One #define, or user function, and it's taken care of.
If it costs you an extra instruction to do the swizzle, than that is real work for the HW. Perhaps Humus is using .wzyx to avoid the swizzle penalty.
 
If anyone can explain this to me, that'll be appreciated.
This is the screenshot captured on NV40, I already used the modified texture which is supposed to provide better quality. But the result just doesn't look correct. The only modification I made to the program is changing the depth cubemap's format from R16F to RG16F so that NV40 can use shadows, but it's irrelevant here since I disabled shadows.

 
What i'm really looking forward to see is if ATI can give better SM3 performance than Nvidia come next generation. It would be pretty funny if they did... Last on the trend, and better to boot...

What's after SM3? And don't tell me SM4, i'd love to know what features are left out from SM3 but will probably be included in SM4 a few years down the line...
 
london-boy said:
What i'm really looking forward to see is if ATI can give better SM3 performance than Nvidia come next generation. It would be pretty funny if they did... Last on the trend, and better to boot...

What's after SM3? And don't tell me SM4, i'd love to know what features are left out from SM3 but will probably be included in SM4 a few years down the line...
Read this site's DirectXNext preview. Pixel shaders and Vertex shaders will be the same.
 
LeGreg said:
where do those strange dents come from ? Are you dithering your texture before compression ?

Nope. I'm just compressing them the way they are. It could be that the texconv.exe tool tries to hide errors with dithering. But I'm not sure if it would look better or worse without it.
 
london-boy said:
What i'm really looking forward to see is if ATI can give better SM3 performance than Nvidia come next generation. It would be pretty funny if they did... Last on the trend, and better to boot...

What's after SM3? And don't tell me SM4, i'd love to know what features are left out from SM3 but will probably be included in SM4 a few years down the line...


virtual video memory? :D
 
But are you still not using the slightly higher-precision green, Humus? This is what you'd want to use for NV3x/NV4x hardware.
 
Evildeus said:
Could you make the same pic with your own compression? Thanks.

they don't look as bad as his. But I don't think we need to spend five more minutes on the topic, thanks.
 
Basic idea: use branching to combine many shaders into one. An uber shader can save performance by reducing state changes, depending upon a number of other factors.
 
Back
Top