I am wondering: If you have all the basics set (distributed geometry processing, unified memory space, high performance gather/scatter and so on), are fixing some of the shortcomings of your arch (narrow path between shader core and ROPs [which I am still inclined to view as an artificial limitation in order to keep power in check and you'd need double the data width for single cycle FP64 anyway]) and you go FP64 single cycle all the way - taken for granted that this is the cheapest way, wouldn't it also be one of the possible best inflection points to get rid of some of the fixed function hardware?
Reasoning being as follows:
- Games are limited with this-generation console ports, not requiring vast amounts of gfx horsepower for at least one generation of gfx hardware
- You have a new process tech and a fundamentally unchanged architecture (presuming the FP64 units were already present as one half of the SMs in Fermi as a kind of trailblazer - or has there been definite prove that two units are coupled together in FP64 mode?), so you can concentrate on porting this in the most power and space efficient way. Much like AMD seems to have done it in the past btw.
It would seem like a smart move, unless I've forgot something important.