Consensus on NV3x Floating-Point Pixel Shaders?

Which is closest to your opinion about NV3x FP pixel shaders, specifically relative to R3xx?

  • B) Apps that still show shaders slow are broken, or will be fixed by driver improvements

    Votes: 0 0.0%
  • C) Slow due to architecture even on NV35, NVidia is cheating benchmarks to cover it up

    Votes: 0 0.0%
  • D) Doesn't matter, since real games will be developed with NV3x shader architecture in mind

    Votes: 0 0.0%
  • E) Doesn't matter, since all current cards will be obsolete when FP-shader dependent games appear

    Votes: 0 0.0%

  • Total voters
    201

antlers

Regular
When the NV30 came out, there was a lot of concern about it's floating point shader (both DX9 and OpenGL ARB) performance. With the first crop of reviews of the NV35, it seemed like new hardware and/or new drivers were addressing the problem. After further investigation, though, a lot of contradictory information as well as some evidence of cheating has turned up.

I post this poll to see if a consensus on this issue is emerging among the denizens of this board. As always, the poll options may not reflect your choice or you may want to pick more than one, but pick the one that is closest to your opinion.
 
D) Doesn't matter, since real games will be developed with NV3x shader architecture in mind

The real question is, is that one really the case? And this is what I think this whole Futuremark issue is related to as well.

Certainly, any game that goes through the TWMTPB campaign clearly will have mixed shader modes in there, such that it looks OK but maintains performance - which is all wel and good. However, are all developer going through TWMTBP? Those that aren't are they going to invest the time in writing mixed precision shaders when an alternative architecture is pretty ambivelent to these things and is just fast enough with high precision shaders? Do developers, on the TWMTBP or not, actually want to deal with this type of thing?

We've actually recently sent out a few questions along these lines to a few devs to see if they will give us their thoughts.
 
My understanding has been that the R3.xx cards have been the main DX9 development card for all DX9 class titles, confirmed even by the developers of S.T.A.L.K.E.R.

It was also my understanding that to get WHQL certification, the driver must adhere to what M$ states as FP 24 minimum, yet obviousally these 44.03 drivers allow lower precision.

So if that is the case, then Developers are going to have to worry about all these different precision modes, and how did they get WHQL certification if they are not meeting spec.
 
Doomtrooper said:
It was also my understanding that to get WHQL certification, the driver must adhere to what M$ states as FP 24 minimum, yet obviousally these 44.03 drivers allow lower precision.

That was my understanding as well. But the cheat drivers 44.03 are WHQL certified aren’t they? Suppose nvidia has pressured Microsoft to actually lower DX9 spec for the FX core? or does WHQL certification necessarily require adherence to DX9 spec?
 
There must be 'something' that doesn't allow companies to put DX9 labels on cards, otherwise I hearby announce my Wifes Radeon 8500 is now DX9 Compliant.
 
Doomtrooper said:
There must be 'something' that doesn't allow companies to put DX9 labels on cards, otherwise I hearby announce my Wifes Radeon 8500 is now DX9 Compliant.

lol, I think that PS2.0 is the requirement.(correct me if I am wrong here.) If the driver is WHQL may not be imperative for DX9 compliance. But I see your point. Somehow nvidia got away with touting the GeforceMX as DX8 compliant so I don't know how they do these things. http://www.nvidia.com/docs/lo/1468/SUPP/PO_GeForce4_MX_92502.pdf
 
Perhaps this will add to the thread. Carmack states:

The significant issue that clouds current ATI / Nvidia comparisons is fragment shader precision. Nvidia can work at 12 bit integer, 16 bit float, and 32 bit float. ATI works only at 24 bit float. There isn't actually a mode where they can be exactly compared. DX9 and ARB_fragment_program assume 32 bit float operation, and ATI just converts everything to 24 bit. For just about any given set of operations, the Nvidia card operating at 16 bit float will be faster than the ATI, while the Nvidia operating at 32 bit float will be slower. When DOOM runs the NV30 specific fragment shader, it is faster than the ATI, while if they both run the ARB2 shader, the ATI is faster.

http://slashdot.org/comments.pl?sid=65617&cid=6051216
 
Can FX12 be supported in DirectX if Cg is used to program the shaders? Or does is compile directly down to PS2.0, thus making the minimum precision defined by the PP modifier?
 
Ostsol said:
Can FX12 be supported in DirectX if Cg is used to program the shaders? Or does is compile directly down to PS2.0, thus making the minimum precision defined by the PP modifier?

Cg support FX12 in DX9
HLSL doesn't.
 
Ostsol said:
Can FX12 be supported in DirectX if Cg is used to program the shaders? Or does is compile directly down to PS2.0, thus making the minimum precision defined by the PP modifier?
Cg generates PS2.0 assembly code for DirectX. There is no way around the assembly shaders in DX, so there is also no way to define integer operations.

The driver however may detect that FX12 is enough in special circumstances, but that has nothing to do with Cg.
 
Xmas said:
Ostsol said:
Can FX12 be supported in DirectX if Cg is used to program the shaders? Or does is compile directly down to PS2.0, thus making the minimum precision defined by the PP modifier?
Cg generates PS2.0 assembly code for DirectX. There is no way around the assembly shaders in DX, so there is also no way to define integer operations.

The driver however may detect that FX12 is enough in special circumstances, but that has nothing to do with Cg.

In Cg specifications, NVIDIA says clearly that Cg can use FX12 in ps_2_0 and in ps_2_x but that HLSL can't.
 
DaveBaumann said:
Do developers, on the TWMTBP or not, actually want to deal with this type of thing?

We've actually recently sent out a few questions along these lines to a few devs to see if they will give us their thoughts.

Like Doom said the one dev who posted seemed to indicate that Radeon 9700 24-bit PS2.0 will be the standard for DX9 games. He also said the GeForce 3 PS1.1 was the standard for DX8. First out definitely has it's advantages. Not so much influencing devs directly but influencing them through MS and DX. Judging by how the last two generations have gone both companies have had their designs largely solidified before DX itself was solidified.

I have a feeling many devs feel this way. However, we have yet to see many DX8 class games. Due to that I ended up voting E. If Nvidia has the same faults in their next gen hardware it will start to be a serious problem. They're going to have to boost 32-bit performance drastically or conform to 24-bit.
 
sure enough, it would suck for anyone with an fx if it did not.

as for the pole, i think it a, b and c but nvidia is pushing to make it look like all d and e. from what i understand the dx9 features are much easier to work with than dx8 stuff, making its way into more games quicker. or at least it would if nvidia could pull it off better.
 
Tridam said:
In Cg specifications, NVIDIA says clearly that Cg can use FX12 in ps_2_0 and in ps_2_x but that HLSL can't.

Cg Language Specifications said:
half, fixed, and double data types are treated as float.
half data types can be used to specify partial precision hint for pixel shader
instructions.
 
Qroach said:
Perhaps this will add to the thread. Carmack states:

The significant issue that clouds current ATI / Nvidia comparisons is fragment shader precision. Nvidia can work at 12 bit integer, 16 bit float, and 32 bit float. ATI works only at 24 bit float. There isn't actually a mode where they can be exactly compared. DX9 and ARB_fragment_program assume 32 bit float operation, and ATI just converts everything to 24 bit. For just about any given set of operations, the Nvidia card operating at 16 bit float will be faster than the ATI, while the Nvidia operating at 32 bit float will be slower. When DOOM runs the NV30 specific fragment shader, it is faster than the ATI, while if they both run the ARB2 shader, the ATI is faster.

http://slashdot.org/comments.pl?sid=65617&cid=6051216

Carmack is always unbiased, and he is correct about the lack of directly comparable modes, but he is simply wrong about the performance implications. It's not that NVidia's FP32 is slower than ATI FP24, which is in turn slower than NVidia's FP16. According to the evidence I've seen, NV3x architecture is nearly as slow at FP16 as FP32. It only speeds up dramatically when you can use FX12 as well (and probably only that in the register combiner context). As far as I can see, this applies to the NV35 as well as the previous versions of the architecture, although the NV35 is not hurting quite so much.
 
Back
Top