of course GPU's all have this already.
Well, you have half of it. Copper could wait on external events to initate something/progress in it's program. That's directly equivalent to the earlier mwait/monitor.
The problem is that GPUs can (apparently) not dispatch code to themself. That's why we have the strange constructs of compute-shader chains passing results of earlier stages to later stages, via "passthrough" compute shaders, not only because the shader can't be automatically started on an event, but also because the communication-channel isn't changeable by the GPU itself (the GPU can't rewrite a shader to pass a variable in a buffer instead of a constant, by itself).
I say apparently, because I've not yet seen/read anything which indicates a GPU can control itself, (re)write programs for itself etc. Even though I could write a shader which writes out GPU-ISA into a texture, I can't feed it as a program to the GPU from within that same shader.
A "Copper" could rewrite camera-matrices (manipulation of a shader's constant buffer) based on listening to a USB-port, without the CPU. I suspect a simple "Copper" would already be so capable that it probably could compile HLSL-assembler to GPU-ISA, or re-optimize GPU-ISA when a extern variable becomes a constant. This is just relative, today 100k transistors isn't very much, and cheap.
It doesn't really need to have caches or a real complex memory-controller, as we are talking about possibly a 500kB working set. It only needs the appropriate connectivity to i/o, to the GPU and the event-producers, maybe to the L2/L3 cache of the CPU.
heh. straight clear is something Blitter could do
maybe they just mean they have the abillty to transfer buffers around asynchronously better.(better efficiency for clears, resolves etc)
Well, no. The solution to this was (it has been solved long ago in hardware): not to clear at all. The z-buffer is often hierarchical, or at least has a minimum tile resolution, and each tile is represented by a bit in the GPU-internal z-buffer map (that map also holds the compressed z-buffer information). That bit indicates if a memory region is cleared or not. The memory isn't even touched.
The framebuffer's a 2D bitmap, not a list of pointers to objects. The only way to get a bitmap in there is to write the image data, and if that image data is based on a preloaded graphic, that constitutes a copy.
You agree that the fastest copy is: not to copy. Right? We also agree that in composing the Windows display-surface we actually never copy (as in
duplicate) anything, no fonts, no rectanges, no fills etc. but that a source-pixel (as a pointer) or an abstract description of a display-element (most elements are procedual now) enters a transforming function and then is written slightly changed to the display-surface. Correct?
That's the thing I wanted to remind of. A data-duplicating blitter IMHO is really useless, we have no UIs anymore which consist only of identical repeated elements. A non-programmable blitter is also useless because display-composition is so complex now that you can not gain anything by just accelerating a tiny fraction of the utilized composition-methods.
A special data-transforming programmable "blitter" is unnecessary if a GPU is present.