Reverend said:
Correct but here you have a fine alternative: if you need a reproducable version of cos, you can implement it yourself as a Taylor series, using floating point add and mul, if add and mul are deterministic. But if add and mul aren't deterministic, it's impossible to implement anything deterministically at all. The basic arithmetic ops are the building blocks.
By “reproducible†and “deterministicâ€, I assume you mean “identical in all implementationsâ€. Otherwise, yes, operations are always reproducible and are deterministic. We did not add a random number generator J
On the other hand, even if floating point adds and muls have identical implementations, in general, you would still not be able to guarantee that two implementations of the same taylor expansion would always give the same results, with the same inputs. You would also need to have not only identical hw, but identical compilers, identical source code and identical OS/APIs. IE^3 makes no guarantees on a sequence of operations. Floating point numbers are inexact and so are operations performed on them. Programmers have learned to live with that.
The 23 bits is a normalized mantissa form of a 24b number. You have 24b of precision.
Er...no, not according to my understanding. IEEE doesn't have any such operation as "a+b+c" whose order is undefined. IEEE only has "a+b", so to add three numbers you need to either specify "(a+b)+c" or "a+(b+c)", either of which is reproducible. Or use a language like C which has well-defined precedence and associativity rules, so that "a+b+c" is defined as being exactly the same as "(a+b)+c" and not "a+(b+c)". Any violation of this in modern languages is an optional compiler optimization that defaults to "off" and is called something like "optimize floating-point operations aggressively". We don't want this even if it's already happening
All I meant is that more complex operations do not have results that are guaranteed by IE^3. The implementation details influence the results. If a PowerPC implements an FMAD with higher precision than MUL/ADD combo, that will lead to slight differences between that HW and other. Doesn’t seem to offend most programmers.
Though there are some that require the exact same results. But they aren't programming pixel shaders
They're not needed if and only if you don't care about reproducability. But if you're going to do like NVidia does and sometimes run VS operations on the CPU for load-balancing, then you get different results along both paths, which is bad. This is exactly the kind of problem the IEEE spec was designed to remedy, so why not use it?
One has to judge the cost of an item and make a call. Certainly if one is with offloading VPU activity to the CPU, having identical implementations is required. On the other hand, PS shaders do not have the luxury of being offloadable. They could be used to offload the CPU, but there’s no API available to do that. Would you increase the cost of the product for something that could not be used?
Saying "this device is IEEE compliant, with this small set of exceptions" is like saying "I am a virgin, with this small set of exceptions". Either Tagrineth is a virgin, or she's not
.
Sure. Quite colorful.
Again it all comes down to whether you see 3D hardware as a deterministic computational device which produces well-defined output for any input, or it's just some black box that you feed polygons into to produce some sort of random approximation of your scene.
Again, 3D hardware is very deterministic. You plug in something, and the same thing comes out. Every time. However, the HW is just part of the whole. The system HW, the OS, the application, the API, the drivers, etc… are all changing. Expecting exactly the same output on all systems is not realistic.
You're implying that the effect of precision loss on FP24 vs FP32 is linear, a sort of one-time penalty -- that's not the case at all in my books. In the worst case, cascading loss-of-precision errors can increase exponentially as a function of instruction count divided by mantissa bit count.
I did not say that it was linear. I was saying that it has the same properties. Yes, the error ranges are larger, but the properties are the same. Going to FP24 does not require a “Change†of philosophy. It just needs the programmer to be aware of the ranges, and to code the applications taking that into account.
"Assuming 1/2 lsb of error per operation" is not realistic. The number of lsb's of error in the worst case can be equal to the difference between the two exponents in the computation, so you can easily have 2, 4, 8, or 16 lsb's of error in any given computation. For example, in a shader that says something like 1/square(magnitude(LightPosition-TexelPosition)), unless your light and texel are real close together, that subtract can easily have many bits of lsb error, and squaring that quantity then doubles the number of error bits.
No, for each operation, assuming ½ lsb of error is correct. Your example is a composite operation. A ½ lsb of error in one operation can be amplified in the next operation, irregardless of the precision. But, you are correct that the errors do not add; you can certainly construction operations that magnify errors. However, when coding PS operations, one should strive for stable code. In most shader codes I’ve seen, things are simple and errors are stable.
If Intel applied this kind of "gee, there's no real use for this combination of instructions" when designing the x86, it would be impossible to write the kind of programs people wrote.
Sure. VPUs really aren’t replacements for CPUs. If ATI gets in that business, we will lose. Intel and AMD are much better at it. Given that VPU outputs eventually get truncated down to 10b for color and 11~15b for texture addresses, things are good (for now).
Sure, it's easy to come up with a 1000-instruction shader that looks perfect with FP24, and easy to come up with a 3-instruction shader that looks like crap with FP24 then looks great with FP32, and then a 3-instruction shader that looks like crap with FP32 but is great with FP64. Floating point is like that.
Sure, FP is inexact. However, I was using empirical evidence to show that our assumptions appear justified.
FP24 was a reasonable decision for the R3x0, which was available basically a year before NV30. It's a lot better than 8-bit integer and gave everyone a sneak peak at DX9's capabilities. But it should be considered a stepping stone, to be phased out as soon as FP32 is commercially viable, rather than being considered a long-term solution. And FP32 may be becoming viable now with NV35 (haven't got one, can't really say for sure). It's just like 3dfx's situation with 16-bit: it was the right solution in 1997, but when 1999 came and they were arguing that it was good enough and nobody needed 32-bit, well, that was not a realistic view.
Never said that FP24 is the end. Neither is FP32, for that matter. At SGI, on GE11 (IR, Impact), we had double precision ALUs, just to compute higher order geometries (circles, spheres). But my point is that FP24 is still brand new and there are no applications yet showing up that push it at all. I explained that FP32 (at full speed) is significantly more expensive than FP24. I also noted that other items (Larger textures, FP displays, etc…) need to kick in as well to justify FP32. One has to weigh the cost and the benefits. I stand by our decision to use FP24. It’s fast and it’s high precision; nobody else can claim those things.
Like I said, the R3x0's is a fine part, a good start wrt FP. I just hope it doesn't become set in stone.
You’re being silly. It’s obvious that FP32 will come, when it’s needed and cost effective. I don’t really understand what you meant to do by all this; FP24 is justified and makes sense, for now; FP32 is not. Why not enjoy the benefits of what is available now? Anyway, I’m glad you think R300 is a “Fine†part.