I don't see the point in forcing them to do that.
If the circuitry were designed so part of the GPU could handle things like interrupts, precise exceptions, software permissions, and the more complicated memory addressing were possible, it could be made so it would be functional, but it wouldn't be very good.
The fact that there is a software driver over the GPU is also something of a problem. It's not impossible, as Transmeta had a code layer over its chip, but the translation layer meant that the best that could be hoped for was 1/2 or worse IPC on most code.
Going from the CTM spec, the closest thing to fitting what the current x86 CPU's would do would be assigning a thread to the command processor (it would have to be much more robust than it is currently, reading off what it is fed by the CPU), which would send a bit of code to one local array scheduler, which would then use one array processor. That's one of 16 array processors out of several arrays.
If there are four arrays, that's 1/64 of the total capability of the chip, at a lower clock speed, and a whole slew of problems getting single-threaded performance. Because the array processor is dependent on the array scheduler, which in turn is dependent on the command processor, a large amount of hardware goes unused.
Because CPU code has so much extra context attached to its threading, changing this would mean making the chip closer to a Niagara-type implementation, with peak execution resources in the neighborhood of 8-16 semi-independent cores where at the same process there could have been something like 128+ shader cores.
Obviously, this is a pretty naive implementation. With work, it might not be a 1:1 correspondence to just one array processor, with some creative design and software work.