IEEE precision and DP
A few notes here:
1. There are different standards for IEEE floating point arithmetic. There's 2008, 1985, etc. Some chips are compliant with different versions. i.e. something designed in 1999 is probably not IEEE-2008 complaint : )
NV claimed that they had IEEE compliant SP in GT200, now they say they had to make improvements to be IEEE for SP in Fermi. So perhaps the spec they were following changed, they are supporting more features, or they just were being deceptive about GT200.
2. There are many facets of floating point arithmetic, more than just the data types. There's rounding, denorms, under/overflow, exceptions, etc.
3. Some of the above can be supported with software or microcode traps, which provides compliance, but impacts performance.
4. Fermi can issue 256 DP FMAs/cycle across the chip, but this blocks issuing any other instructions. GT200 can issue 30 DP FMA/s across the whole chip, and it also blocks issuing other instructions. That's actually a 8.5 increase, although frequency obviously matters - I appreciate that NV marketing took the high road and didn't rounding up.
5. RV770 can do a DP FMA/VLIW, there are 16 VLIWs/SIMD and 10 SIMDs/chip. For those who can multiply...that's 160 DP FMAs/chip. Cypress doubles that to 320 DP FMAs/chip.
6. Fermi's latency on integer instruction varies by the instruction type and operand width.
7. I will write about RV770 and Cypress later, once I have more information and a complete grasp on what they do. The manuals will probably tell you exactly what features are supported, but I expect that they are probably IEEE754-2008 compliant.