And why exactly a more sophisticated memory model is needed in game console?
The first answer is that it's what we've already gotten.
There is no tractable development target CPU architecture that has an incoherent memory hierarchy that barely maintains the order of reads from writes in the same thread.
Reasoning through the development, validation, and use of established operating systems and system architectures means getting a sophisticated memory system, whether the game knows it or not.
That's the hardware and software architecture the GPU gets to hook into.
Low-overhead communication and command submission means being able to interact with memory locations in that sophisticated protected portion, rather than have communications going through an intermediary system process that performs copies, patches addresses, and notifies the consumer of the communication. If you want the GPU to be able to handle that memory properly, it has to play by those rules.
Getting things wrong is a fast way to have the system shut everything down.
To switch faster into menu?
It keeps bad code in one of those contexts from stomping on the data of the other, and it secures the OS(s) and hypervisor from erroneous or malicious code.
It maintains protections, simplifies linking and communication between modules and libraries, and keeps the code from having to be rewritten because some instruction sequence in one place bumped things around a little bit in the stream, unlike what a certain local store architecture had a habit of doing.
All of these are one time setup tasks, which can be considered baggage as well.
Right now you need to feed CP with same commands each frame, which is a bigger problem. But it just means that drivers/APIs are bad, we know that already.
We've just gone to the effort of making sure that the GPU only has enough to process one triangle at a time, and nothing else.
And a digest of the system's carry-over state between frames.
Right now, or in the future hardware? In the future it can DMA from specific controller/CPU.
The future hardware is the fault mask used by Carrizo to detect when there's a fault and the GPU asks the grownup hardware to fix the booboo. After that is unclear.
P.S. and continuing the car analogy: GPU is the engine, and CPU is the starter motor, something like that.
If we want to torture the metaphor: In the cars I know of, the starter doesn't kick in whenever the driver taps the clutch, turns the wheel, or it starts raining.
Why not call the CPU the water pump of the GPU gasoline engine? In both cases, the system can operate about as long without.
CPUs are equally not suited. But modern CPUs have a lot of tricky hardware around them to make it less painful to run "branchy code".
So they're equally not suited, just that one does better when it encounters a branch in real life. I'm reading this right, right?
And the solution for both GPU or CPU is to stop writing branchy code
I tried adding that to my process.
IF code.is.branchy then...
...then...
..
. .
.