Still a bad idea but doable.
Which shows there's still some distance to go to be able to implement any functionality, requiring a less restrictive architecture.
But I'd like to refer to Andrew Lauritzen's answer to question 12 in the article. Just like differences in performance for branching have pretty much leveled out now, GPUs are destined to become even more tolerant of wider ranges of workloads. Branchy code, code that uses lots of variables, code with irregular memory accesses, code with small data types, code with task dependencies, sequential code, etc.
Again we already know where this will lead us to: a cross between today's GPU and CPU architectures. In other words a homogeneous architecture capable of exploiting ILP, DLP, and TLP. Programming such a thing will not be free of APIs, but looking at the thriving software ecosystem of the CPU some of these APIs will be written by independent software developers and not the hardware vendors.
Dynamic recompilation works just as well on GPU as CPUs, why do you think its not already in use? All GPU code is JITed before submission, how aggressively is simply down the driver team.
It's still fundamentally different. JIT compilation on the GPU is essentially used to provide backward compatibility. The hardware is designed to support the latest API and all the previous ones, but nothing more. JIT compilation on the CPU on the other hand can provide a great level of forward compatibility as well.
CPUs are awesome at lots of code, GPU aren't, but claiming that CPUs are somehow special is wrong.
Why would that be wrong? The GPU is actually relying on the CPU for compilation and such. It would be worthless without it. So clearly the CPU is special.
So for all I know (not being a HW guy) in 10 years time, we will be arguing about things that look nothing like what we have now.
I'd be very disappointed if in 10 years from now GPUs weren't as versatile as today's CPUs.
Of course we might no longer call it a Graphics Processing Unit then, but just a Processing Unit. And if it sits in the middle of the motherboard, we'll just call it a Cental Processing Unit. ;-)