There are certain elements of this change that are pointing to a different track than Fermi.Absolutely not. GCN merely catches up with Fermi.
The CU's scheduling changes are making it appear as the possible base of a shared coprocessor.
Ironically, if it were to become something like a FlexFP unit, it could very well go back to being VLIW or at least LIW-ish, given what macro-ops and dispatch groups actually are.
In AMD's case, it's easier, because the coprocessor model isolates the rename and scheduling portions of the FP unit from the OoO integer pipes. If the unit was instead a CU, the in-order nature would be none of the core's business.
There may be a third way, though it isn't fleshed out as of yet. AMD has already speculated on perhaps putting instructions into the ISA that would integrate disparate devices into the same instruction stream.And do you think PCIe based memory coherence for discrete GPUs is going to work well? Or are we expected to buy an APU plus a discrete GPU? Developers don't like programming two devices, so they'll downright hate programming three. There's really no other choice but to make the CPU fully homogeneous. And it's well within reach, so I'm sure Intel is looking into it right now.
The CU design is already built to be shared amongst various controllers, either the graphics pipes or compute pipes. Adding a third client in the form of a CPU core would not be impossible.
If the CU keeps its control flow capability, an instruction with the proper escape sequence could make a thread migrate to the CU, where it would function as an offload engine until an escape sequence sends it back. If not, it behaves like the upcoming FlexFP unit, reliant on the integer pipe for control flow.
Potentially, this could be regarded as a compiler hint or noop in a chip without a CU.
This drops it down to 2 devices, or perhaps 2.1 devices.