Nvidia, Floating Point Blending and 'Errors'

I think we're playing with fire when we try to redefine infinity as any specific value, whether large or small. When a physical quantity that we're modelling, whether light intensity, the amount of photons or distance, has gone to infinity, it can't come back down again. The system that we're using has lost all ability to measure that value and can't tell if multiplying by 0.1, 0.01, 0.001 or any other value will make it "measurable again".

As we keep dividing by smaller and smaller numbers, the result of the division becomes bigger and bigger. By limit theorem, dividing by zero should result in infinity. Now, if someone requires clamped variables because they can't handle infinity, that's another matter.

The other complaint is what the graphics card will render when a pixel's colour value is infinity. Does the RAMDAC display maximum (saturated) white, as it should?
 
Also of interest is, say, when you're normalizing a vector and you end up dividing by zero. The vector definitely shouldn't have infinite length. In this case, you should not get INF, but NaN (indeterminate).

The other potential issue is that if you're casting a FP32 float to a FP16 float, how do you represent a value that is outside the range of FP16? Would it be +INF? What if the shader was run at full precision, would that value come back down? Obviously this should be handled on the programmer side by simply scaling numbers back as to prevent overflows.
 
Chalnoth said:
Also of interest is, say, when you're normalizing a vector and you end up dividing by zero. The vector definitely shouldn't have infinite length. In this case, you should not get INF, but NaN (indeterminate).
Dividing a vector by zero would make the length of the vector infinite but the vector would still have a direction. Unfortunately, in a cartesian representation, the direction and length of the vector are jointly represented, so the direction is also lost.

Dividing by very small fractions is the same as multiplying by very large numbers. If you multiply by infinity, you get infinity.

Chalnoth said:
The other potential issue is that if you're casting a FP32 float to a FP16 float, how do you represent a value that is outside the range of FP16? Would it be +INF? What if the shader was run at full precision, would that value come back down? Obviously this should be handled on the programmer side by simply scaling numbers back as to prevent overflows.

I'm not sure about that one. My personal opinion is that an out-of-range value that cannot be represented in FP16 should be infinite. If the coder then want to cast back to FP32, tough, the value has been lost. He casted to FP16 in the first place, didn't he/she?
 
That's the problem. All of these vectors are in the cartesian representation. Anyway, I don't really think the problem are the infinities, but indeterminates (NaN's).
 
Yeah, I totally agree. NaNs are really nasty. Like dividing 0 by 0 should result in 0 (for practical purposes like Humus says).
 
Well, more accurately, it should converge to whatever value it would be as you approach that point. So, one may want it to be zero, or one, or some other number. It really depends upon the algorithm, or whether you think it's acceptable to have the occasional single pixel that is the wrong color.
 
Chalnoth said:
Well, more accurately, it should converge to whatever value it would be as you approach that point. So, one may want it to be zero, or one, or some other number. It really depends upon the algorithm, or whether you think it's acceptable to have the occasional single pixel that is the wrong color.
Your method only makes sense if you have a removable discontinuity.

-FUDie
 
Chalnoth said:
Which would be the case whenever you don't desire an infinity.
Not really. If your function goes to -infinity on one side and +infinity on the other, then you are SOL.

-FUDie
 
The only vectors of length zero are zero vectors which have no direction. Normalisation of a zero vector generates an indefinite rather than an infinity.

With actual floats there is the possibility that the vector has a direction, but the result of the V.V calculation is 0 due to underflow. This will again result in an indefinite.

If you want zero vectors to pass through, a single compare afterwards will sort it out - or you could compare at the end of the pipe and substitute some alternative (or use the hardware's default). The better solution is to ensure that your data isn't going to generate zero vectors, or at least be vanishingly unlikely to.
 
Dio said:
The only vectors of length zero are zero vectors which have no direction. Normalisation of a zero vector generates an indefinite rather than an infinity.

With actual floats there is the possibility that the vector has a direction, but the result of the V.V calculation is 0 due to underflow. This will again result in an indefinite.

If you want zero vectors to pass through, a single compare afterwards will sort it out - or you could compare at the end of the pipe and substitute some alternative (or use the hardware's default). The better solution is to ensure that your data isn't going to generate zero vectors, or at least be vanishingly unlikely to.

do you mean by 'indefinite' a vector whose direction would not be possible to infer? well, i believe that the situation is not actually that grim (although it's not trivial either), given that true zero vectors (i.e. (0, 0, 0)) do get avoided.

so what i mean:
let us have some non-zero-yet-disturbingly-small scalar A, somewhere on the virge of our precision, which exhibits the unlucky property of A^2 = underflow.

let us have an unlucky vector v comprised exclusively of A-like values, i.e. v = (B, B', B") where
0 <= |B| <= |A|,
0 <= |B'| <= |A|,
0 <= |B"| <= |A|
casually (i.e. v.v), this chap would yield a length of underflow.

so i believe we could still squeeze some meaningful direction info out of such a guy, at the expense of some extra logic in our normalization code. like this:

Code:
// so we know for sure v is not a true zero vector, hence we want to get its direction

vect dir;

if ( dot(v, v) == 0) 
{ 
    // produce a new v1 by replacing each A in v for 1 * sign(A) 
    vect v1 = replace_As_for_ones(v);
    // find the direction of the one vector obtained above
    dir = v1 / norm(v1);
} 
else
{ 
    // bah, trivial case
    dir = v / norm(v);
}

although far from perfect, above method would yeld directions which are quaranteed to be in the same quadrant as the direction of the original fella v. which is still infinitely-times-more information compared to no direction information at all :D
 
Actually, it might be better to simply do a series expansion of sqrt(A1^2 + A2^2 + A3^2) instead.

The series expansion of 1/sqrt(1 + x^2) is:
1 - x^2/2 + 3x^4/8 - ...

This is equal to:
|a|/sqrt(a^2 + b^2)

...with x^2 = b^2/a^2

So, the series becomes:
|a| - b^2/a + ...

So, to zeroth order, the expansion for the length of a very small vector should be:
|A1| + |A2| + |A3|
...which should be easy enough to calculate. I believe it would be the dot product of your "v1" with the original vector. It's also nice because the final vector is guaranteed to have a length less than one, provided this isn't identically the zero vector.

One could go to second order, but that would add a lot of terms of the form: b*(b/a), and wouldn't be so easy to calculate. Zeroth order should be enough if A^2 causes an overflow.
 
Back
Top