psurge said:
ShootMyMonkey - AFAIK, that's incorrect. I'm pretty sure an FPU implementation that does not round to the correct precision (32 bit, 64 bit, 80 bit or 128bit) after every single operation is not fully IEEE compliant. That is, the registers themselves must contain correctly rounded results, and rounding cannot be deferred until a value is stored to memory.
Section 5 (Operations) of the IEEE-754 standard says:
"All conforming implementations of this standard shall provide
operations to add, subtract, ..., convert between floating-point
and integer formats, ... and compare. ... Except for binary<->decimal
conversion, each of the operations shall be performed as if it
first produced an intermediate result correct to infinite precision
and with unbounded range, and then coerced this intermediate result
to fit in the destination's format. .... Normally, a result is rounded to the
precision of its destination."
So, yep you're right - it is possible to store unrounded values
if they fit. This brings up a whole host of issues with optimising compilers that potentially raise your single type to a double on the H/W and then round it arbitrarily in either direction (should really be 'round to even') so it appears to match your single type.
Typically though you don't have 80-bit or 128-bit FPU's (except in GPUs; MMX and SSE use SIMD to bundle several floating point values together remember so you don't have a 128bit FP number, rather a collection) and actually use software to emulate them. In fact, IEEE just states that with >79 bits (called 80-bit because you can now see the hidden bit) you have some extended type but the definition of this is quite lax in comparison to the rest of the standard which strictly defines bit-for-bit how operations should work (though it is assumed if these algorithms work they should hold also for the rest of the bits you add).
i believe some of them go at a rate of 2 flops/clock too
Perhaps but it is not their speed that matters, it is the loading the SSE values that takes time. Why these overwrite the FPU stack entries I will never know, many are the mysteries of x86 (and why it still exists
)