Explain To Me The Benefits of SM 3.0 For Nvidia

DemoCoder · Jul 8, 2004

andypski said:
The DXT5 version will consume an extra slot in SM2.0 unless you have the appropriate swizzle in hardware. Unless you have the 'arbitrary swizzle' cap bit set then you will have to add an extra instruction to your shader that may or may not be optimised out depending on the underlying hardware.

The swizzle required is .wzyx which does not require arbitrary swizzle, it's the base component reversal swizzle which is mandatory in PS2.0. I just don't find this "we don't need a swizzle" argument very compelling from a development perspective. Now, if 3Dc had automatic computation of the Z or "free" normalization on texture lookup, you might have a compelling argument. But the swizzle is beyond trivial and sounds like grasping for straws.

3Dc has an advantage in IQ. As far as any development advantages for coding productivity, it is vanishingly small.

OpenGL guy · Jul 9, 2004

DemoCoder said:
andypski said:

The DXT5 version will consume an extra slot in SM2.0 unless you have the appropriate swizzle in hardware. Unless you have the 'arbitrary swizzle' cap bit set then you will have to add an extra instruction to your shader that may or may not be optimised out depending on the underlying hardware.

Click to expand...

The swizzle required is .wzyx which does not require arbitrary swizzle, it's the base component reversal swizzle which is mandatory in PS2.0. I just don't find this "we don't need a swizzle" argument very compelling from a development perspective. Now, if 3Dc had automatic computation of the Z or "free" normalization on texture lookup, you might have a compelling argument. But the swizzle is beyond trivial and sounds like grasping for straws.

If you want to use the G component for extra precision, then you want a .wyXX swizzle (I say XX because the other two components are irrelevant). As far as I know, none of the .wyXX swizzles fall under the standard PS 2.0 swizzles.

3Dc has an advantage in IQ. As far as any development advantages for coding productivity, it is vanishingly small.

You're right... when you use the wrong swizzle

Humus · Jul 9, 2004

Evildeus said:
Humus said:

I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.

Click to expand...

Could you show us a pic comparing both?

andypski · Jul 9, 2004

DemoCoder said:
The swizzle required is .wzyx which does not require arbitrary swizzle, it's the base component reversal swizzle which is mandatory in PS2.0. I just don't find this "we don't need a swizzle" argument very compelling from a development perspective. Now, if 3Dc had automatic computation of the Z or "free" normalization on texture lookup, you might have a compelling argument. But the swizzle is beyond trivial and sounds like grasping for straws.
3Dc has an advantage in IQ. As far as any development advantages for coding productivity, it is vanishingly small.

As OpenGL Guy pointed out for the two component case you want to use the G component of the colour block since this has higher precision than the R and B components. I guess it would certainly be possible to preswizzle your source information in the toolchain such that normal maps are in the appropriate component order, having Y as the derived axis (along with altering code so that all vectors are in the same coordinate system), but I expect this is a bit of a pain. In addition I think that the swizzles on the dp2add then might become screwed up so you might need to derive a different way - I'll check.

You can use DXT5 in a three-component mode and store all components explicitly without derivation to avoid this problem but the three component mode may produce lower quality because the compressor typically has more problems doing a good job with a two component colour block than a single component one.

Note that I also don't think that the extra swizzle cost is particularly significant in development time or effort to support. I was just pointing out that I thought you were wrong in both your initial statement :

Humus, Isn't that a bit disingenous? 3Dc requires adding additional shader instructions to compute the Z coordinate so it can not "fit right into the same shader" any easier than the DXT5 method.

... since the 3DC version certainly does not require any extra instructions, whereas the DXT5 method might.

And also in your other statement here:

ATI seems to be trying to sell the idea that games should only support non-3Dc low-res maps and hi-res 3Dc maps, in an attempt to kill off one of their own ideas

Which quite frankly is so far away from the truth of anything that we have been trying to do with our research and assistance to developers regarding normal map compression that I have to work hard not to find it personally insulting, although I'm sure you didn't intend that to be the case.

DemoCoder · Jul 9, 2004

OpenGL guy said:
You're right... when you use the wrong swizzle

Talk to Humus, his demo uses Z.

In any case, it's still not a big selling point. As I said, I'll buy the IQ argument, but the swizzles are not imposing any real work. One #define, or user function, and it's taken care of. 3Dc is a minor step forward, it is not a dramatic or significant step forward IMHO.

I understand the need to evangelize it to get developer adoption and I support developers using both 3Dc and DXT5 (I do not advocate against it), but too much hype sets off my PR detector.

OpenGL guy · Jul 9, 2004

DemoCoder said:
OpenGL guy said:

You're right... when you use the wrong swizzle

Click to expand...

Talk to Humus, his demo uses Z.

In any case, it's still not a big selling point. As I said, I'll buy the IQ argument, but the swizzles are not imposing any real work. One #define, or user function, and it's taken care of.

If it costs you an extra instruction to do the swizzle, than that is real work for the HW. Perhaps Humus is using .wzyx to avoid the swizzle penalty.

991060 · Jul 9, 2004

If anyone can explain this to me, that'll be appreciated.
This is the screenshot captured on NV40, I already used the modified texture which is supposed to provide better quality. But the result just doesn't look correct. The only modification I made to the program is changing the depth cubemap's format from R16F to RG16F so that NV40 can use shadows, but it's irrelevant here since I disabled shadows.

Humus · Jul 9, 2004

OpenGL guy said:
Perhaps Humus is using .wzyx to avoid the swizzle penalty.

Yup.

Evildeus · Jul 9, 2004

Humus said:
Evildeus said:

Humus said:

I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.

Click to expand...

Could you show us a pic comparing both?

Click to expand...

http://esprit.campus.luth.se/~humus/temp/3DcDXT5compare2.jpg

Thanks, 3Dc better indeed

(but i can't really compare to the first pic

)

London Geezer · Jul 9, 2004

What i'm really looking forward to see is if ATI can give better SM3 performance than Nvidia come next generation. It would be pretty funny if they did... Last on the trend, and better to boot...

What's after SM3? And don't tell me SM4, i'd love to know what features are left out from SM3 but will probably be included in SM4 a few years down the line...

LeGreg · Jul 9, 2004

Humus said:
[/img]

where do those strange dents come from ? Are you dithering your texture before compression ?

pat777 · Jul 10, 2004

london-boy said:
What i'm really looking forward to see is if ATI can give better SM3 performance than Nvidia come next generation. It would be pretty funny if they did... Last on the trend, and better to boot...

What's after SM3? And don't tell me SM4, i'd love to know what features are left out from SM3 but will probably be included in SM4 a few years down the line...

Read this site's DirectXNext preview. Pixel shaders and Vertex shaders will be the same.

Humus · Jul 10, 2004

LeGreg said:
where do those strange dents come from ? Are you dithering your texture before compression ?

Nope. I'm just compressing them the way they are. It could be that the texconv.exe tool tries to hide errors with dithering. But I'm not sure if it would look better or worse without it.

AlNom · Jul 10, 2004

london-boy said:
What i'm really looking forward to see is if ATI can give better SM3 performance than Nvidia come next generation. It would be pretty funny if they did... Last on the trend, and better to boot...

What's after SM3? And don't tell me SM4, i'd love to know what features are left out from SM3 but will probably be included in SM4 a few years down the line...

virtual video memory?

pat777 · Jul 10, 2004

Humus said:
Evildeus said:

Humus said:

I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.

Click to expand...

Could you show us a pic comparing both?

Click to expand...

Wow, now you have to look closer to see the difference.

KimB · Jul 10, 2004

But are you still not using the slightly higher-precision green, Humus? This is what you'd want to use for NV3x/NV4x hardware.

Evildeus · Jul 10, 2004

LeGreg said:
Humus said:

[/img]

Click to expand...

where do those strange dents come from ? Are you dithering your texture before compression ?

Could you make the same pic with your own compression? Thanks.

LeGreg · Jul 11, 2004

Evildeus said:
Could you make the same pic with your own compression? Thanks.

they don't look as bad as his. But I don't think we need to spend five more minutes on the topic, thanks.

pat777 · Jul 11, 2004

Can some explain to me what's nVIDIA's "uber shader" is?

KimB · Jul 12, 2004

Basic idea: use branching to combine many shaders into one. An uber shader can save performance by reducing state changes, depending upon a number of other factors.

Explain To Me The Benefits of SM 3.0 For Nvidia

DemoCoder

OpenGL guy

Humus

Crazy coder

andypski

DemoCoder

OpenGL guy

991060

Humus

Crazy coder

Evildeus

London Geezer

LeGreg

pat777

Humus

Crazy coder

AlNom

Moderator

pat777

KimB

Evildeus

LeGreg

pat777

KimB

Similar threads