There would need to be a more pervasive change to the GPU and programming model to give any single component full CPU functionality.
I noted that the GPU may not exist below a certain layer of abstraction.
It really is up to a combination of API, machine commands, internal microcode, and a raft of simple processors to perform what appears to be a simple command on a command queue (running in memory allocated for that purpose at the discretion of the host system).
The GPU's cache system is not that complex, and it supports a very simplistic view of memory. This is allowed because it assumes the CPU will handle all the complexities of a memory system with varying permissions, cacheability settings, fault handling, and interrupts before the CU has to work with it.
Each step in the processing of a queue command has a number of hidden steps. The command processor is not a single processor, but a custom block of 2 or three processors with each handling a subset of the "ISA" of queue commands it draws from the command queue. There's dispatch hardware, at least one processor in each ACE, and a processor of sorts per CU, and some amount of processing that might occur in the export process. A lot of this is tied together in some non-standard ways, like the subcomponents of the command processor and the way CUs depend on something else to initiate the contexts they will then run.
The GCN ISA document does not talk about the queue commands the front end processes, and the APIs we have do not talk about the ISA commands. GCN code cannot navigate the space of concerns involved in getting itself to run. It assumes that's someone else's problem, and a lot of that someone else is outside the GPU.