View Full Version : Pixel Shader precision and range
I've been doing some pixel shader work recently on ATI Radeon and GeForceFX series, other than the 8500 series being slightly buggy with a few operations (note: i'm still researching this), ATI is generally quite stable especially with the 9500+ series which I can't actually find a problem with.. nVidia isn't too bad either, but my biggest concern with them is that the range is from -1 to +1 .. even on the GeForceFX which is the "best" card that they have.. I find quite a few of my weird shaders have underflow and over flow problems only on this hardware as a result, where ATI/RefereneRasterizer display it 100% fine..
This is pixel shader 1.1-1.4 series, I was basically wondering why this decision was made to have such high end cards capable of such large ranges (32-bit float ?!) have such a crappy range of -1 to +1 .. I can see having that on a GeForce3, but why on the FX too? are they not using 32-bit precision on the FX in DX8? if not what is it 16-bit? and isnt that capable of better than -1 to +1 anyway?
arjan de lumens
There have been some rather lengthy discussions here at B3d on this subject. It appears that at most of the GeforceFXes (except possibly the NV35) have separate functional units for FP32 and fixed-point operation, with the fixed-point units being much faster and more numerous than the FP32 units. The fixed-point units, of course, offer much less precision and range than the FP32 units, and are used to implement PS1.1 shaders. IIRC, the fixed-point units can do 12 bits of precision.
AFAIK, it is not really clear at this point why Nvidia made the architectural decision to do both low-precision fixed-point units, and high-precision FP32 units - possibly backwards compatibility (if you program for Geforce3, you may have come to expect and even use for certain effects the [-1,+1] clamping, although I don't know what kinds of effects that might be), optimization for the 'common case' (DX7/DX8-class features, which is what most present games use), saving transistors (too many fast FP units = large transistor count; adding a lot of fixed-point units is very cheap in comparison), and possibly other reasons as well.
Ok let's see how different cards execute PS1.1-PS1.3 shaders:
GF3/4 texture ops: float (FP32?) arithmetic ops: FX9 [-1; +1]
R8500 texture ops: FX16 [-8;+8] arithmetic ops: FX16 [-8;+8]
GFFX texture ops: float (FP32?) arithmetic ops: FX12 [-2; +2]
R9700 texture ops: float (FP24) arithmetic ops: float (FP24)
Refrast texture ops: float (FP32) arithmetic ops: float (FP32)
For PS1.4 replace texture ops with phase1 and arithmetic ops with phase2.
So you bottlenecks are R8500 for texture ops and GF3/4 for arithmetic ops.
GeForce FX's fixed point units (for PS 1.1-1.4 and fixed point calculations in OpenGL) have a [-2,2) range. But I think that this range is only exposed in NVIDIA's OGL extensions and limited to [-1,1] in DX.
I always thought that this could be a problem as PS1.4 shaders may have been designed with ATI R200/R300 range in mind [-8,8]. Of course NVIDIA could choose to use the FP units for PS1.4 calculations (I think they did it with some drivers, maybe these ones could help you) but it's not really good for performances :( As it's not easy to give the drivers the ability to find when a range of [-1,1] is enough, NVIDIA probably decided to use a range of [-1,1] by default.
Well I sent an email about 1-2 months ago detailing some shader code I was using to nvidia, you know so you could just paste it in the MFCPixelShader app that comes with the dx8 sdk and compare the Reference Rast to their own hardware
never heard back, so either they are too busy, or they've heard this problem one too many times, from our development standpoint it really kind of sucks, its quite obscure issue unless you research it a bit which we did, luckily it doesn't show up that often, but you you know .. what's the purpose of a shader if can't abstract what hardware its on.. oh well thanks for all the info guys
From the DX9 SDK under 'Registers ps 1_x':
The range is the maximum and minimum register data value. The ranges vary based on the type of register. The ranges for some of the registers can be queried from the device caps using GetDeviceCaps.
Name Type Range Versions
cn Constant -1 to +1 All versions
rn Temp - MaxPixelShaderValue to + MaxPixelShaderValue All versions
tn Texture - MaxPixelShaderValue to + MaxPixelShaderValue 1_1 to 1_3
tn Texture - MaxTextureRepeat to + MaxTextureRepeat 1_4
vn Color 1_4
You can't make any assumptions about range at all - you have to do the queries to check what range you're actually going to get.
We saw that in the dx8 caps as well, I guess the purpose behind this thread/question is more "why" is this the behavior of said cards.. some possible answers might be to make things consistent across all their cards (3-4-FX) or maybe 12-bit is too little, either way I guess they won't change it..
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.