Explain To Me The Benefits of SM 3.0 For Nvidia

LeGreg · Jul 7, 2004

Humus said:
In two ways. I offers better quality at the same storage space, and it can fit right into the same shader as if a regular texture had been used (no swizzles needed).
[/img]

Humus this screenshot is bogus :
Here is your original dxt5 texture:

But if we take care to recompress it correctly:

Here is the original rendering (on my radeon 9700) :

and with the recompressed dxt5 :

Of course I agree for the extra swizzling.

Evildeus · Jul 7, 2004

So it seems that Humus' horrible DXT5 was just a bad compression of texture?

991060 · Jul 7, 2004

LeGreg, can you upload the modified texture for us?

Simon F · Jul 7, 2004

DemoCoder said:
Humus, Isn't that a bit disingenous? 3Dc requires adding additional shader instructions to compute the Z coordinate so it can not "fit right into the same shader" any easier than the DXT5 method. Thus, to support either DXT5 or 3Dc, you need to add extra instructions, and writing ".xyzw" vs ".wzyx" on one of the instructions is equally laborious.

It is true that there is a slight IQ improvement, but the comparison you should be making is low-res uncompressed normal maps vs DXT5/3DC, and in that comparison (hi-res DXT5 vs uncompressed low-res), DXT5 is clearly a large improvement.

ATI seems to be trying to sell the idea that games should only support non-3Dc low-res maps and hi-res 3Dc maps, in an attempt to kill off one of their own ideas: DXT5 for normal map compression. But it doesn't wash, because the labor to support DXT5 is minimal, while the improvement is large. Both 3Dc and DXT5 should be supported.

Having just looked at Humus's test pictures, it just occured to me that one possible reason that the DXT5 example looks so much worse is that the errors/approximations from block to block are all correlated.
Look at the "bands" in the edges of the tiles...

If you were to change the compressor (or even tweak the values post compression) so that it added a bit of noise to the rounding when calculating the two representative colours, some of the bands/flatspots should be hidden.

LeGreg · Jul 7, 2004

Simon F said:
Having just looked at Humus's test pictures, it just occured to me that one possible reason that the DXT5 example looks so much worse is that the errors/approximations from block to block are all correlated.
Look at the "bands" in the edges of the tiles...

to me it looks more like his compressor is just plain broken:
If you look closely at the texture image I posted you have a single color
replicated all over the 4x4 texel block.
You shouldn't have to tweak your compressor to fix that :/

Recall · Jul 8, 2004

Ahh thanks alot guys, you REALLY know your stuff here, I was told to come to you and learn and I am not dissapointed. There is alot of information to take in here

I am definately learning though. Im gonna try and sum it up, tell me where Im wrong please:

1 FP16/32 combo allows more flexibility in coding. You could code say one texture at FP16 and another at FP32 as and when required. This would help to keep the speed up, and relativly no loss in IQ? Or can you only render in one or the other for the whole screen?

2 Dynamic branching allows for more instructions to be sent without being held up, so they can do them in one pass. Also it can throw out un needed instructions. But if this is the case, if it processing less instructions, how is it slower? I assume one pass instructions are less of a hit on bandwidth than multiple passes? ( god I think i got this bit really wrong

)

3 The ability to have more instructions. means that a shader can be more complicated and longer, but achieve the same effect as multiple ones in SM 2.0? So this should save developers time, as they would only have to write one complicated shader instead of lots of small ones? This is where the increased shader length helps, as more instrcutions means larger shaders, and the length is somewhat restricted in SM 2.0?

4 3DC allows higher compression than DTX5 at lower res/normal maps which means more textures can be loaded into a single shader. Less shaders mean less work for the GPU, so you get a speed gain as well as IQ increase. Also less shaders are gonna mean effectivly less passes.

Like I said correct me where I am wrong, but be gentle please

Humus · Jul 8, 2004

LeGreg said:
to me it looks more like his compressor is just plain broken:
If you look closely at the texture image I posted you have a single color
replicated all over the 4x4 texel block.
You shouldn't have to tweak your compressor to fix that :/

Well, blame Microsoft. It's their texconv.exe. I took a closer look at the texture now, and it seems you're right. And for some reason the red and green channels that's supposed to be zero are filled with 206. :? It seems it's just replicated for some 4x4 blocks, but not always either. I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.
I could just have had a bug in my 3DcGenerator, but I don't think that's it, cause the results have been the same, and I've recompressed the files many times over during the project, and I did it just before releasing the demo too, and I can't recall I've done anything to 3DcGenerator since then. Nor have I changed the texconv.exe util though. But I just installed a new DirectX runtime, maybe that's why I'm getting better results now? I know it's using the DX runtime for something at least, since some people have reported errors about not being able to create a Direct3D device with this tool. Anyway, I'll try with an older runtime and see if there's any difference.

andypski · Jul 8, 2004

DemoCoder said:
Humus, Isn't that a bit disingenous? 3Dc requires adding additional shader instructions to compute the Z coordinate so it can not "fit right into the same shader" any easier than the DXT5 method. Thus, to support either DXT5 or 3Dc, you need to add extra instructions, and writing ".xyzw" vs ".wzyx" on one of the instructions is equally laborious.

It is true that there is a slight IQ improvement, but the comparison you should be making is low-res uncompressed normal maps vs DXT5/3DC, and in that comparison (hi-res DXT5 vs uncompressed low-res), DXT5 is clearly a large improvement.

ATI seems to be trying to sell the idea that games should only support non-3Dc low-res maps and hi-res 3Dc maps, in an attempt to kill off one of their own ideas: DXT5 for normal map compression. But it doesn't wash, because the labor to support DXT5 is minimal, while the improvement is large. Both 3Dc and DXT5 should be supported.

The only disingenuous thing would seem to be any implication that ATI are trying to influence developers not to use DXT5 compression for normal maps, when nothing could be further from the truth.

DXT5 is a usable method of compressing normal maps, and we took great pains to first experiment with and then develop this as a technique, and then to further develop and provide the appropriate tools to developers. This doesn't alter the fact that 3DC is a significantly better solution with higher overall quality, but since introducing 3DC we have actively advocated the use of DXT5 to developers as an effective fallback on older hardware. We are always trying to help developers to get the best out of existing hardware, rather than simply concentrating on our latest hardware and features to the exclusion of all else.

Humus has already covered the reasons why no additional instructions are needed to support 3DC when compared to an RGB implementation, so I won't go into that any further here.

KimB · Jul 8, 2004

Recall said:
1 FP16/32 combo allows more flexibility in coding. You could code say one texture at FP16 and another at FP32 as and when required. This would help to keep the speed up, and relativly no loss in IQ? Or can you only render in one or the other for the whole screen?

The FP16/FP32 architecture of the NV3x/NV4x has nothing to do with texture formats. It has to do with processing. The Radeons can read and write with FP16/FP32 just as well (except that they can't filter/blend at FP16 like the NV4x can).

FP16/FP32 processing with the NV3x/NV4x has more to do with the processing. It's more like: since the final output is only going to be 8-bit integer, why should all of the processing I do be 32-bit all the way through the pipeline? Surely there are places where it'd be okay to drop the processing to 16-bit FP and still get the same output (The NV3x can also drop to 12-bit integer, but that's another story).

2 Dynamic branching allows for more instructions to be sent without being held up, so they can do them in one pass. Also it can throw out un needed instructions. But if this is the case, if it processing less instructions, how is it slower? I assume one pass instructions are less of a hit on bandwidth than multiple passes? ( god I think i got this bit really wrong )

Latency, basically. Think of it as each time an "if" statement is found, it takes a few cycles to shuffle things around properly so the GPU can keep working. If it knows beforehand which branch it's going to take, this job is made much easier, and the GPU doesn't have to wait to make sure it does things right. This is the case, for example, with Humus' "Dynamic Branching" demo, where the first pass finds out which branch to take, and the next pass actually takes that branch.

4 3DC allows higher compression than DTX5 at lower res/normal maps which means more textures can be loaded into a single shader. Less shaders mean less work for the GPU, so you get a speed gain as well as IQ increase. Also less shaders are gonna mean effectivly less passes.

No, 3Dc allows for higher-quality compression. The compression ratio is exactly the same, but 3Dc will have fewer errors. The performance between 3Dc and DXT5 should be identical (well, except, perhaps on R3xx hardware which may require an extra swizzle instruction).

g__day · Jul 8, 2004

I guess the key question is how will game developers perceive SM 3.0. Because SM 3.0 is meant to make coding easier or more effective / natural.....BUT

But - any decent game must have fallback (scalability) in its graphics code. So you don't do things in just one way - you have multiple code paths - like Half Life 2's Source engine or Doom 3 or FarCry or PainKiller or STALKER or Massives Krass engine for Aquanox etc. Sometime you have many, many code path fallbacks. Half Life 2 scales all the way from DX 6 to DX9 for instance,

So once you work out the effects (shaders) you want you must then ask with several classes of Hardware how do I do this and can I afford it (is it fast enough?) The lower you set the rock bottom the bigger your potential customer base is - but the fewer special effects you can afford, whilst the converse is also true.

So game developers have shaders with fall backs based on a systems graphic capabilities and overall grunt. If a game has 5 levels of code paths - you have to code and modify your shader algorithms for each path - reducing complexity as you go from high end cards to low end cards.

Basically this means you could write a SM 3.0 shader set then an equivalent or slightly less beautiful SM 2.0 effect then a much less beautiful but faster SM 1.+ code path etc. Not all the time to you want high end cards executing the latest code paths either. With say the Krass engine you aim for the lowest shader model you can do the effect you want in to get speed. So for say Aquamark3 only 30% of the time are you executing SM 2.0 code.

* * * * *

So really NVidia has a first entrant and a potential marketing advantage. In reality do you want to buy a car with six gears or seven? Cause this is what this argument is really about. A car is not sold with only one gear - a graphics card and any modern game supports multiple shader levels and mixes them often very frequently to get maximum utilisation / performance. One more shader model supported - especially in its first instance doesn't give you a hell of a lot - except for an appertiser and a base to build something on. What a SM 3.0 or 4.0 card will deliver is a far more powerful card (features and raw performance) to met the increasingly high requirements of each shader level. Of course ATi's X800 cards are probably just as much - if not often more powerful than NVidia's cards - but NVidia can say "We have a 0.5th gear where ATi has only 1st gear - for times when you have to scale a sheer cliff..." Not mentioning that most roads don't have a plethora of sheer cliffs

Evildeus · Jul 8, 2004

andypski said:
This doesn't alter the fact that 3DC is a significantly better solution with higher overall quality, but since introducing 3DC we have actively advocated the use of DXT5 to developers as an effective fallback on older hardware.

Well it seems that the differencies in quality is much lesser than the launch presentation showed. In Humus demo the compression on DXT5 was bogus and the quality differencies are thin now.
Perhaps it's easier to code, but as everyone will be using DTX5 (because only the X800s are benefiting of this technic), it means extra-code for slighly better quality, i don't know if i would call it "significantly better solution" (but perhaps i didn't get your reason).

Evildeus · Jul 8, 2004

Humus said:
I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.

Could you show us a pic comparing both?

LeStoffer · Jul 8, 2004

Humus said:
I tried just rerunning it through my 3DcGenerator, and now I got a much better texture. Still visibly better for 3Dc, but not as dramatic.

What a relief for all us non-R420 people.

andypski · Jul 8, 2004

Evildeus said:
Perhaps it's easier to code, but as everyone will be using DTX5 (because only the X800s are benefiting of this technic), it means extra-code for slighly better quality, i don't know if i would call it "significantly better solution" (but perhaps i didn't get your reason).

Perhaps you simply haven't looked at enough samples of the compression quality achieved with the two systems? Basing your opinion on a tiny sample of data is not likely to give you the correct impression.

Fundamentally using DXT5 introduces a highly artificial distinction in terms of compression quality between the two axes - in reality there is no such distinction as to which axis is more important in terms of compression quality.

I have looked at many cases of normal maps here - in the general case I would term the quality achieved with DXT5 as acceptable, but I have certainly seen a lot of cases where it leaves a bit to be desired. Shallow gradients or complex curvatures of normals can introduce significant blocking due to the lack of resolution in one axis, and these artifacts can potentially be arbitrarily magnified in importance depending on, for example, the degree of specularity of the surface. Spherical features can be a particularly problematic case as they cause gradients that cut through blocks at arbitrary and changing angles (which always tends to be a problem for DXTC style block-based compression schemes).

3DC is significantly better for normal map compression than DXT5 - the additional precision essentially eliminates the problems mentioned above.

Evildeus · Jul 8, 2004

Well, you are surely right but i'm still waiting to see the "significantly better".

nelg · Jul 8, 2004

Evildeus said:
Perhaps it's easier to code, but as everyone will be using DTX5 (because only the X800s are benefiting of this technic), it means extra-code for slighly better quality, i don't know if i would call it "significantly better solution" (but perhaps i didn't get your reason).

Lets change two words in that statement DTX5 to SM.3 and X800s to 6800s

Perhaps it's easier to code, but as everyone will be using SM3.0 (because only the 6800s are benefiting of this technic), it means extra-code for slighly better quality, i don't know if i would call it "significantly better solution" (but perhaps i didn't get your reason).

DemoCoder · Jul 8, 2004

ATI's own presentation on the matter says:

Very slight noise remains, but almost invisible.
All other areas look essentially identical to the original.
Specular blocking has almost completely disappeared.
Specular highlights are the correct brightness.

And yes, I think claiming that adding a swizzle to existing instructions constitutes "extra instructions" over and above 3Dc is disingenuous. No additional instruction slots are consumed by the DXT5 version. With a single #define macro, you could handle this situation for your entire codebase of shaders.

Significantly better is in the eye of the beholder. IMHO, this is an oversell. It is only significant in pathological cases. In the majority of cases, I bet it is not visible to most people unless they perform zoomed screenshot analysis.

Evildeus · Jul 8, 2004

Hehe good one nelg. If it was only PS3.0 i would agree. But it seems that VS3.0 is a big improvement

karlotta · Jul 8, 2004

DemoCoder said:
... I bet it is not visible to most people unless they perform zoomed screenshot analysis.

Ill take that bet. guess you will need to buy a X800 to pay up. Seems they are telling Devs. to put in the DXT5 fallback too..... what could be so bad about this?

andypski · Jul 8, 2004

DemoCoder said:
Very slight noise remains, but almost invisible
All other areas look essentially identical to the original
Specular blocking has almost completely disappeared.

Click to expand...

'Almost invisible'
'Essentially identical'
'almost completely disappeared.'

All of these are qualified statements regarding the particular test cases being used, in the same way that I can say that a DXT1 compressed RGB texture with no magnification frequently looks 'essentially identical' to the original, but if I look closer there may be problems.
In the test cases examined, with the degree of specularity being used, the specular blocking might have been relatively minor, but take the same data with a shinier material property (a higher specular power function) and things might change.

Fundamentally speaking compress a spherical region of a normal map with DXT5 and it will band significantly more around one axis than the other. This is hardly a good thing.

DXT5 is highly usable if you are willing to sustain some reasonable level of overall quality loss (and given the memory footprint reduction it's almost certainly a good tradeoff for most developers). That is why we researched it in the first place.

And yes, I think claiming that adding a swizzle to existing instructions constitutes "extra instructions" over and above 3Dc is disingenuous. No additional instruction slots are consumed by the DXT5 version. With a single #define macro, you could handle this situation for your entire codebase of shaders.

The DXT5 version will consume an extra slot in SM2.0 unless you have the appropriate swizzle in hardware. Unless you have the 'arbitrary swizzle' cap bit set then you will have to add an extra instruction to your shader that may or may not be optimised out depending on the underlying hardware. 3DC does not consume an extra instruction slot, requiring the same number of instructions as standard normal maps (replacing a dp3 with a dp2add).

Significantly better is in the eye of the beholder. IMHO, this is an oversell. It is only significant in pathological cases. In the majority of cases, I bet it is not visible to most people unless they perform zoomed screenshot analysis.

And yet people still somehow find cause to advocate the advantages of FP32 over FP24 today even though the precision differences are far further down in the noise floor than the case we are talking about here, and only visible by generating far more difficult pathological cases than those required to demonstrate issues with DXT5. I expect that some of the people making extravagant claims for the benefits of FP32 are people who would also happily downplay the precision and quality advantages of 3DC over DXT5.

Given that 3DC is apparently oversell compared to DXT5 (with errors down in the 6->8 bit range) then FP32 must be massively oversold since the differences with respect to FP24 are much smaller...

Anyway, I digress. Naturally some of this is in the eye of the beholder, but my suspicion is that most people with backgrounds in image compression would probably agree with me when I state that the quality advantages of 3DC are significant.

Explain To Me The Benefits of SM 3.0 For Nvidia

LeGreg

Evildeus

991060

Simon F

Tea maker

LeGreg

Recall

Humus

Crazy coder

andypski

KimB

g__day

Evildeus

Evildeus

LeStoffer

andypski

Evildeus

nelg

DemoCoder

Evildeus

karlotta

pifft

andypski

Similar threads