FP32 will make NV40 slow?

binmaze · May 6, 2004

AlphaWolf said:
r300 should quite simply ignore _pp hints and render fp24. If the banding/IQ problem still exists with radeon cards running nv30 path then _pp isnt the problem.

Logically speaking, it's like 'there is another factor other than _pp.'
We cannot rule out _pp from the cause list, with this info only.

jvd · May 6, 2004

AlphaWolf said:
r300 should quite simply ignore _pp hints and render fp24. If the banding/IQ problem still exists with radeon cards running nv30 path then _pp isnt the problem.

yes but iwhat if the 16pp hits are optimized and wont look good as 24bit or even 32 bit for that matter

Ostsol · May 6, 2004

binmaze said:
AlphaWolf said:

r300 should quite simply ignore _pp hints and render fp24. If the banding/IQ problem still exists with radeon cards running nv30 path then _pp isnt the problem.

Click to expand...

Logically speaking, it's like 'there is another factor other than _pp.'
We cannot rule out _pp as the cause, with this info only.

There was a thread in the FarCry forums where someone took the shaders packages, unpacked them, and repacked such that the new versions of shaders, which included partial precision hints, were replaced by the old versions, which didn't. The result was no more banding. As such, it appears clear that the problem is in the shaders. In shaders, the most likely candidate is precision.

Doomtrooper · May 6, 2004

http://ubbxforums.ubi.com/6/ubb.x?a=tpc&s=400102&f=170106891&m=181103073

Bob3D · May 6, 2004

Do you guys believe NV40 can handle FP and at least keep the performance on par with R420?

DemoCoder · May 6, 2004

Ostsol said:
binmaze said:

AlphaWolf said:

r300 should quite simply ignore _pp hints and render fp24. If the banding/IQ problem still exists with radeon cards running nv30 path then _pp isnt the problem.

Click to expand...

Logically speaking, it's like 'there is another factor other than _pp.'
We cannot rule out _pp as the cause, with this info only.

Click to expand...

There was a thread in the FarCry forums where someone took the shaders packages, unpacked them, and repacked such that the new versions of shaders, which included partial precision hints, were replaced by the old versions, which didn't. The result was no more banding. As such, it appears clear that the problem is in the shaders. In shaders, the most likely candidate is precision.

_PP should be completely ignored on R300. It's more likely that the differences between the full precision and half precision shaders isn't just adding _PP to every instruction. It's more likely, there are shaders for _PP which are completely different in some ways, or some shaders are effectively 1.1 shaders.

Frank · May 6, 2004

Well, the difference is running PS 1.1 instead of PS 1.4-2.0, which has less range. So what you see is almost certainly rounding errors.

The lighting calculations are chopped off to low integer values. So, the shader that calculates the lighting difference on top of the default value in the texture gives a rough, rounded value with a small range for a set of pixels in a group. Banding.

That's (correct me if I'm wrong) the thing designers talk about when they say that they need high dynamic range for their lighting to look nice.

So, it's not FP16, FP24 or FP32 that is to blame, but the range allowed by the shader model used. And PS 1.1 is used on the NV3x to improve performance. This might be corrected for the NV4x when Far Cry understands the difference between a NV3x and a NV4x, if the performance penalty is acceptable (which it should be).

Bob3D · May 6, 2004

DiGuru said:
So, it's not FP16, FP24 or FP32 that is to blame, but the range allowed by the shader model used. And PS 1.1 is used on the NV3x to improve performance. This might be corrected for the NV4x when Far Cry understands the difference between a NV3x and a NV4x, if the performance penalty is acceptable (which it should be).

Thanks DiGuru.
It's possible for a high precision mode to run faster than a "low mode"?

EDIT:
And also, If farcry, example, use PS 2.0 with 16FP the bugs we see in Toms screenshots will go away?

Frank · May 6, 2004

Bob3D said:
DiGuru said:

So, it's not FP16, FP24 or FP32 that is to blame, but the range allowed by the shader model used. And PS 1.1 is used on the NV3x to improve performance. This might be corrected for the NV4x when Far Cry understands the difference between a NV3x and a NV4x, if the performance penalty is acceptable (which it should be).

Click to expand...

Thanks DiGuru.
It's possible for a high precision mode to run faster than a "low mode"?

Sure. It just depends on the shader program used. Complex lighting will probably be faster on 2.0 than 1.1, as it would require multiple passes (small shader programs) with 1.1 to get the same effect as a single, larger 2.0 shader program. But the NV3x doesn't run those more complex shaders very well in a number of cases, for example when using a full set of temporary registers (for example, to calculate the result of two vectors).

So, it depends. The NV4x should have no difficulty with them.

Frank · May 6, 2004

Bob3D said:
EDIT:
And also, If farcry, example, use PS 2.0 with 16FP the bugs we see in Toms screenshots will go away?

Yes. But it will probably be (a bit) slower on NV3x cards.

Ostsol · May 6, 2004

The problem with the idea of the banding being caused by PS1.x is that a Geforce 4 rendering the same scene looks entirely different. There is some banding, of course, but not of the blockish type seen on the NV3x and NV40. Here's an NVNews.net thread that Doomtrooper posted a while back:

http://www.nvnews.net/vbulletin/showthread.php?t=26855&page=1

Bob3D · May 6, 2004

based on all shaders benchmarks released I can't see nvidia full precision mode (32FP) running fast than ATI full precision mode (24FP).
I'm right?

Ostsol · May 6, 2004

The difference in performance due to precision on NVidia cards is entirely due to register limitations. Using many registers causes performance drops. Using FP16 double the number of registers one can use before performance starts to drop, though, because two FP16 registers get packed inside one FP32 register. ATI does not seem to have such limitations -- or at least they don't manifest themselves until a much greater number of registers are used. Back to NVidia, though: using a small number of registers should result in the same performance regardless of precision.

Frank · May 6, 2004

Ostsol said:
The problem with the idea of the banding being caused by PS1.x is that a Geforce 4 rendering the same scene looks entirely different. There is some banding, of course, but not of the blockish type seen on the NV3x and NV40. Here's an NVNews.net thread that Doomtrooper posted a while back:

http://www.nvnews.net/vbulletin/showthread.php?t=26855&page=1

True. That doesn't make it invalid, though: the R3xx shows exactly the same image when you make FarCry think it's a NV3x, while it uses FP24. So it can only be the shader model.

My guess would be, that Crytec changed the algorithm for NV3x to make it run better, thereby creating the different banding artifacts.

Xmas · May 6, 2004

Bob3D said:
based on all shaders benchmarks released I can't see nvidia full precision mode (32FP) running fast than ATI full precision mode (24FP).
I'm right?

There are several shaders where the 6800Ultra beats the X800XT, even with FP32. And this doesn't come as a surprise, when you compare the different architectures. Both have their strengths and weaknesses, and while the NV40 IMO seems to be able to do a bit more work on average per pipe per clock, it is still held back by register issues, while R420 enormously benefits from higher clock rates.

In an extreme case of 2-component MULs NV40 could theoretically be 4 times faster than an equally clocked R420. But you can construct similarly extreme cases in the opposite direction.

FP32 will make NV40 slow?

binmaze

jvd

Ostsol

Doomtrooper

Bob3D

DemoCoder

Frank

Certified not a majority

Bob3D

Frank

Certified not a majority

Frank

Certified not a majority

Ostsol

Bob3D

Ostsol

Frank

Certified not a majority

Xmas

Porous

Similar threads