Nvidia, Floating Point Blending and 'Errors'

FWIW I agree with a number of people here.

While on the surface it seems reasonable to replace x/0 with MAXFloat, I really think in general the IEEE solution is the better one.

If your going to divide by Zero you should know that it could happen.

I suspect though that the NVidia issue is more about writing 32 bit fp values into 16 bit fp Values (The frame buffer) and numeric overflow in that case. I'm speculating here.
This is somewhat more questionable behaviour, but IMO it's still reasonable, and it's 1 instruction to fix it.

I think where the question gets interesting is where there are no CPU counterparts to the ops what does INF do in a clamp operation (FWIW I think it's pretty well defined)? etc.
 
nutball said:
One mans general case is another mans special case.

Some of the compilers I've worked with have had quite sophisticated support for dealing with FP exceptions, configurable behaviour for value replacement when FP ops generate Inf/NaN, etc, or have flags to auto-detect division-by-zero, etc.

I'm fine with a solution where I can configure the behavior so that you maybe have IEEE as default, but can turn it off. But I don't like to have to write extra code wasting precious cycles only to workaround IEEE annoyances.
 
pocketmoon66 said:
In 'real' programming can you imagine how hard it would be to debug an application if it didn't trap fp exceptions such as div0 but rather just set the result to 'something useful' ?

And wouldn't it be useful if we all shipped debug builds rather than release builds to our customers? I mean, it would certainly help us in finding bugs when the customer can report that he got a messagebox with the text "ALERT(string != NULL). Main.cpp, line 647". But that's not how we do things in the real world. We ship release builds because they are smaller and faster.
 
Simon F said:
But Humus, what if HW vendor X's decision on what to do with division by zero is not one you want? AFAICS it's up to the programmer to put in the handling that they want. Personally I think the IEEE behaviour is fairly sensible in this respect. I'm not that keen on its denormalised numbers but that's a different issue.

As long as it's clearly specified that division by zero returns MAX_FLOAT, rather than each vendor chosing their preferred behavior it's all fine.
 
Simon F said:
But in terms of graphics algorithms, where are you going to have division by zero?

Projection? That's handled by doing clipping against the frustum and is generally hardwired in so is a non-issue.

Lighting calcs where the light gets too close? Since that's computing a distance, the value will always be positive and so the epsilon trick should be ok. (In fact, it's probably necessary for realistic modelling since you wouldn't have an infinitely bright light anyway... it would have set fire to the material :) )

Vector normalisation where the vector is zero? Same sort of situation.

I think that, in general, it should be ok.

In terms of graphics a division by zero is far from uncommon. Anytime you divide by something that's interpolated chances are it will pass zero a some point. Any form of projective transformation risks getting zero in w. This applies to shadowmapping, projective texturing and a load of other effects. You have normalization, computing distance between lines and a whole range of other geometric tasks that risk getting a division by zero.

And let's not get into the area of minus under the square root, raising negative numbers to a power, atan2(0,0) etc. etc. etc.

To take the negative square root as an example. In GL_ARB_fragment_program it is defined as sqrt(abs(x)). This is a typical "useful behavior" that's mathematically wrong, but who cares? It's the right way to do it in graphics IMO.
 
Humus said:
And wouldn't it be useful if we all shipped debug builds rather than release builds to our customers? I mean, it would certainly help us in finding bugs when the customer can report that he got a messagebox with the text "ALERT(string != NULL). Main.cpp, line 647". But that's not how we do things in the real world. We ship release builds because they are smaller and faster.
Side comment: I got errors like this when running Freespace 2. They really weren't useful, it turned out, because I only got them due to an improper install (the CD was warped slightly, and would sometimes read incorrectly upon installation).
 
Drak said:
If you don't want to deal with division by zeros, you can condition your division by adding epsilon to your divisor.

But that's requires another instruction. When a particular task is going to be repeated for tens or hundreds of million fragments each second I don't want to add unneccesary workload.
 
Humus said:
And wouldn't it be useful if we all shipped debug builds rather than release builds to our customers? <snip>We ship release builds because they are smaller and faster.

err are you saying that

a) if Nvidia didn't have specific 'values' for -/+ Inf and NaN they would have faster hardware ?

and / or

b) if ATI did have specific 'values' for -/+ Inf and NaN they would have slower hardware ?


I would have thought that it's just a case of extra wires and gates, not extra code. No one is suggesting that Nvidia hardware or drivers actually raise any sort of exception (in the usual sence) but rather their ALU's are build with the extra bits and pieces to process calculations containing +/- inf/Nan the IEEE way.

So it doesn't come down to performance at all (although you could argue that it effects gate budget and therefore, indirectly, performance) . It comes down to preferred behaviour.

But either way It's not really an issue for developers today, just something to be aware of.
 
Humus said:
The problem with that is that it can be quite expensive to handle division by zero cases. If you instead generate MAX_FLOAT, chances are you'll get a useful enough value so that you don't have to do any extra math just to avoid getting artifacts. Say you're doing a simple projection for a spot light. Somewhere there's going to be a line of division by zeros. If I can just mask away this area along with backlighting the problem goes away. Cheap and easy. If on the other hand I'm getting NAN or INF the artifacts remain and I'll have to explicitely take care of it, unneccesarily wasting computing power.
But to me it sounds like there's already some kind of handling that's been added there. In some cases this handling might be free, in some it might not. I take the view that it's pretty trivial to take the output of the calculation and do 'if X==INF (or NaN, or whatever) x=MAX_FLOAT' (which is 1 instruction).

When we're talking about the kind of shader lengths we'll be talking about here, it isn't going to hurt even given that it isn't going to be needed often.

The biggest risk factor I see is vertex normalisation of poorly conditioned input data (notably texture interpolation, especially with mipmapping). But that's a nightmare anyway, because there's no value you can use to normalise a vector of length 0 that won't generate artifacts. Having a NaN propagate to the end of the pipeline and then generate 0 onto the colour buffer seems perfectly reasonable to me.
 
suicuique said:
The division by zero is not a "problem" per se in maths.
That it is undefined is a requirement, not an annoyance,

If you want it to be defined you are begging for *real* problems, as you are throwing the consistency of simple arithmetics out of the window.

If you are dividing by zero you already got your problem. Just setting the result to your liking wont help you there.

Say you set something like real/0 = +-MaxNum (with respect to the signs of real and 0).

How do you trace such an "error"?
What about scientific computing on the GPUs?

IMHO it would be of grave error to throw exception handling of Div/0 out of the window, just because it seems convenient.

regards, alex

But this is not mathematical science. This is graphics. Visuals. Eye-candy. It's all fake anyway. If you want to analyze amino acids' interaction with red blood corpuscles OpenGL is not the tool you're looking for.
Just because IEEE may be a convenient thing for debugging scientific applications on our CPUs doesn't mean it's the best way to do things when we want to render Lara Croft's boobs.

A shader is a small script that's typically the "inner loop" of most visual applications. We don't want extra work there just to take care of special cases. We want to crunch numbers as fast as possible to achieve a certain output. To do that we want a set of tools that does the job best. Math is a good tool. But there's nothing saying that we need the same mathemathical rules that are applied at our universities. If your professor say sqrt(-4) = 2i, I say sqrt(-4) = 2. Nor is 0^0 undefined, I say 0^0 = 1. And I don't care if (-1)^3 = -1, I'm fine if (-1)^3 = 1. I don't want to waste silicon or clockcycles on such special cases. Nor do I want to waste silicon on ensuring correct rounding to the last bit.
 
pocketmoon66 said:
err are you saying that

a) if Nvidia didn't have specific 'values' for -/+ Inf and NaN they would have faster hardware ?

and / or

b) if ATI did have specific 'values' for -/+ Inf and NaN they would have slower hardware ?


I would have thought that it's just a case of extra wires and gates, not extra code. No one is suggesting that Nvidia hardware or drivers actually raise any sort of exception (in the usual sence) but rather their ALU's are build with the extra bits and pieces to process calculations containing +/- inf/Nan the IEEE way.

So it doesn't come down to performance at all (although you could argue that it effects gate budget and therefore, indirectly, performance) . It comes down to preferred behaviour.

But either way It's not really an issue for developers today, just something to be aware of.

It's a waste of silicon IMO. Whether is affects performance of the actual hardware I don't know. But that's not the big deal. What I'm saying is adding extra instructions to take care of these cases slows things down. Instead of your regular 12 instruction shader, you now have for instance 13 instructions, unneccesarily slowing things down by 8%.

I should point out, in case anyone wonder, that's I'm only stating MY OWN personal opinion. Just so that noone takes my words to be the official opinion of ATI, or anything of that sort.
 
Dio said:
But to me it sounds like there's already some kind of handling that's been added there. In some cases this handling might be free, in some it might not. I take the view that it's pretty trivial to take the output of the calculation and do 'if X==INF (or NaN, or whatever) x=MAX_FLOAT' (which is 1 instruction).

When we're talking about the kind of shader lengths we'll be talking about here, it isn't going to hurt even given that it isn't going to be needed often.

But that's the exact thing I don't want to have to do. I'm not happy with a poor solution just because it works.
Also, there's nothing saying that this will be cheap, even for very long shaders. Your if-statement could just as well be in the inner-loop and have to be repeated a lot of times to get proper handling.
 
Humus said:
But that's the exact thing I don't want to have to do. I'm not happy with a poor solution just because it works.
Also, there's nothing saying that this will be cheap, even for very long shaders. Your if-statement could just as well be in the inner-loop and have to be repeated a lot of times to get proper handling.
Well, I could say almost exactly the same thing:

I'm not happy with a poor solution just because it works. There's nothing saying that your solution will give acceptable results even on very short shaders if the assumptions made in design are not the ones you want.

One could argue that we're both right.
 
Dio said:
Humus said:
But that's the exact thing I don't want to have to do. I'm not happy with a poor solution just because it works.
Also, there's nothing saying that this will be cheap, even for very long shaders. Your if-statement could just as well be in the inner-loop and have to be repeated a lot of times to get proper handling.
Well, I could say almost exactly the same thing:

I'm not happy with a poor solution just because it works. There's nothing saying that your solution will give acceptable results even on very short shaders if the assumptions made in design are not the ones you want.

One could argue that we're both right.

yes. and i believe the hw should provide max info WRT to the problem at hand. in reality we have nans, infinities, underlows and overflows; whether we'd like those treated 'scientifically' or swept under the carpet, the hw must give some account on those, what the developer does then is, well, personal choice.
 
What we really need, I think, are ways for the programmer to define the behavior of NaN's when input into a shader instruction (i.e. the programmer could decide that he/she wants NaN's in the shader to all act as zeroes). Optimally you'd want to be able to change this behavior on a per-instruction basis, for no performance hit.
 
I'll ask again, why should MAX_FLOAT be any better than using INF? I don't think the results with using MAX_FLOAT would be any more meaningful.
 
Xmas said:
I'll ask again, why should MAX_FLOAT be any better than using INF? I don't think the results with using MAX_FLOAT would be any more meaningful.
I'll agree with this. I too, can't see any real advantage of clamping to MAX_FLOAT instead of using INF. In practical terms you're just redefining a new INF value.
For example
  • 1/Maxfloat is a denormalised number which is going to be clamped to 0 on most graphics hardware anyway... so why not just use INF? (since 1/INF = 0))
  • If you always clamp to MAX_FLOAT then MAX_FLOAT- "a sensible range value" is still going to be MAX_FLOAT.
 
naiive probably , and i'm talking about pixel shaders not vertex . .

got texture , light it , light it , light it , then darken it ( or just do something with it).

imagine the lights are *very* bright, , if the 3rd light goes "inf" , then you can't mathmatically "darken" it . .. if it saturates at "max-float" then at least you get a visible solution ??

-dave-

obviously inf or max would be controllable via yer "fp-flags" . . . .
 
Visible, but still wrong. If someone's writing algorithms that do things like that they need to be aware of the consequences anyway.
 
davefb said:
naiive probably , and i'm talking about pixel shaders not vertex . .

got texture , light it , light it , light it , then darken it ( or just do something with it).

imagine the lights are *very* bright, , if the 3rd light goes "inf" , then you can't mathmatically "darken" it . .. if it saturates at "max-float" then at least you get a visible solution ??
It depends upon how much you darken it, and whether the lighting calculation in question was output to the register in FP16 or FP32. +INF and MAX_FLOAT would give the exact same value if this darkening doesn't do a whole lot.

From a numerical stability standpoing, you really don't want to be doing the above anyway. It'd be much more accurate to, say:

lighten, darken, lighten, darken, lighten, etc.

Additionally, if you do the above, you may discover that you can bake the "darkening" instructions into either the texture or the color of the light.
 
Back
Top