I wonder how much die area a really low performance FP64 implementation takes.. is it necessarily significant? Especially for a device that can otherwise do a few dozen SP operations per cycle.
Once the implementation is "really low performance", why have it at all, other than to satisfy a feature checkbox?
Put another way: would you rather the mobile GPU vendors produce a "really low performance FP64 implementation", or spend that time further improving the F32 path?