The die shots show the IO section is on the other side of the die, and that there seems to be an area of unused silicon next to the FPU, which is close to if not matching the area "saved" by shrinking the FPU.Could it be that in order to get the I/O complex on die and still remain within transistor and power budget they had to make some cuts to the FPU and other changes they thought were small?
Unused die area negates the supposed savings in transistors, as the cost of the die doesn't change if the final dimensions are the same.
If the power constraint is that significant, it runs counter to some of the implication from Mark Cerny's presentation about making 256-bit instructions more sustainable, since the core isn't able to sustain 128-bit instructions at the expected level.
What is odd on top of that is that since the core does support half the throughput of 256-bit instructions, that is the same vector throughput of normal 128-bit execution. So why did 128-bit throughput get cut?
What is potentially saved at 256-bit is some of the scheduling and data routing of 4 128-bit SSE ports, but at this point, how much of a sliver of the power budget was saved?