Jawed
Legend
http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&p=7I am thinking that DP in fermi is closely tied to int and spfp operations, which is why dp is unlikely to be deleted in gaming parts. They'll prolly disable the exceptions, rounding modes, denormals etc. (or some of them) in mid range parts, but retain some dp capability.
This implies that all "32 FPU cores" work together to produce 32 DP-FMAs in two cycles. It seems to imply that INT is not used for DP, and that each pipeline produces 16 FMAs in two cycles. But the instruction is despatched to both pipelines, from a single warp.Each core can execute a DP fused multiply-add (FMA) warp in two cycles by using all the FPUs across both pipelines
There's also the question of the need to implement subnormals, which in GT200 seemingly carries the cost of a 168-bit adder. Does GF100 have an adder like that?WHY? you ask.
Int mul is 32 bits, and sp mul is 23 bits. If you use both of them, you get 55 bits of multiplication. dp needs 52 bits. Just about right for doing dp fma.
Except SP and INT can't be dual-issued within a pipeline.This could be a reason to have so much int power, and how they are managing to but so much dp in gaming parts (albeit high end) without going bankrupt. This could be why you can dual issue spfp and int's, but dp doesn't dual issue with anything else.
Jawed