For one thing, unless there are many passes, it is rather pointless to store more than 64 bits of color data in a single output, and the massive instruction count of the FX should help to reduce the need to have large numbers of passes. When this is necessary, then we probably will not be talking about realtime rendering.
So, with the FX, you may want to output 64-bit color into the packed 128-bit buffer reasonably-often, leaving 64 bits more to store. I could imagine using 32 bits of this for normal information for lighting (two 16-bit floats), leaving another 32-bits for...something.