Should I bother to continue using 0x87 registers?

K.I.L.E.R

Retarded moron
Veteran
I run a 64bit OS and only develop 64 bit programs.
AMD recommend I use only SSE registers, however I do not see the problem with using 0x87 registers in order to do calculations on the side.

What's up with the XMM registers that make them so damn special, that exclusive use of them is recommended?
PS: I'm not talking about GPRs in this discussion, just 0x87(MMX, 3DNow) vs XMM.

Even GCC on a 64 bit OS uses SSE registers by default for FP stuff.
 
What's up with the XMM registers that make them so damn special, that exclusive use of them is recommended?

There's many good reasons to use SSE(x).

XMM registers are randomly accessible instead of having an impractical stack based registry architecture and sharing it's registers with MMX(AFAIK switching to and from MMX requires a 'reset' either using FNSAVE and FRSTR or by issuing EMMS when finished with MMX to clear.).

There are 16 xmm registers in X86-64 mode; I can't find anything to suggest that the FPU stack has been expanded beyond 8 registers.

SSE can handle both vectors and singles, depending on what you need.

AMD has said that they do not execute X87 code as efficiently as SSE AFAIK.

SSE has fast low-precision approximations for reciprocals, square roots and reciprocal square roots which can be very helpful when loss of precision is a non issue.

It's easier to debug and code for a registry structure where loading a float to a register means that float will stay in the register until you deliberately change it instead of moving around every time you do an operation that affects the number of floats on the stack. Underflows and overflows on the stack can't happen in SSE.

SSE supports mixed integer and float operations as well as bitwise logical ops.

Support for x87 code may disappear in future OS's running in X86-64 mode.
 
Thanks. Very interesting on point " AMD has said that they do not execute X87 code as efficiently as SSE AFAIK.", I thought that this was the case but couldn't back it up and saw no proof of it in their docs.
 
Really? There is a n AMD document that describes different ASM optimizations. They have tables with instruction execution time, SSE is in worth case at least 50% faster then FPU.
 
Do you know the name or number of that document?
I've downloaded just about every document off their site.

Sorry I'm silly, you're talking about the table in the end of the PDF.
 
Back
Top