strange nVidia *format* precision problems
Hi all..
I'm writing a long shader doing raytracing, and encountered some serious problem with nVidia GPU/driver. Both vendors may give completely different result compared with MS REF device, but the nVidia side is more difficult to avoid. All vars are intend to be fp32.
####
edit: its a range problem. i can't store anything larger than 65536. i dont know why ATI and REF works.
1: one shader is doing screen space interleaving operation, such as
when i use the index to sample fp32 texture, it's OK, but when sampling fp16 textures, the shader does not work correctly. This is not dependent read.
#####
####
edit: it might be a L16 format problem... thanks to Bob
2: I encode a texcoord to a 1D value, and then expand it in a standard way:
the shader just drop to fp16 presicion, because when f is larger than 1024, the 2D texcoord just samples 1 of every 2 texels, or less.
#####
these code work as expected on ATI and REF device.
Any suggestions of this?
-------------------------------------------------------------
edit: NV43 with 91.47, tried 84.26 and is the same.
Hi all..
I'm writing a long shader doing raytracing, and encountered some serious problem with nVidia GPU/driver. Both vendors may give completely different result compared with MS REF device, but the nVidia side is more difficult to avoid. All vars are intend to be fp32.
####
edit: its a range problem. i can't store anything larger than 65536. i dont know why ATI and REF works.
1: one shader is doing screen space interleaving operation, such as
Code:
int2 vEvenOdd = floor(fmod((scrPos.xy + 0.5), 2.0));
int idx = vEvenOdd.x + vEvenOdd.y * 2;
float4 index = float4(idx == 0, idx == 1, idx == 2, idx == 3);
#####
####
edit: it might be a L16 format problem... thanks to Bob
2: I encode a texcoord to a 1D value, and then expand it in a standard way:
Code:
float i = f / 256;
return float2(frac(i), i / 256);
the shader just drop to fp16 presicion, because when f is larger than 1024, the 2D texcoord just samples 1 of every 2 texels, or less.
#####
these code work as expected on ATI and REF device.
Any suggestions of this?
-------------------------------------------------------------
edit: NV43 with 91.47, tried 84.26 and is the same.
Last edited by a moderator: