Would any of this a asynchronous GPU computing work on Wii U? I know VLIW5 was not considered real good as this type of work, but if its simply making use of the GPU down time, then any additional work done is beneficial. For example, even if the Wii U GPU could only complete 25 dancers in the allotted time, if that didn't hurt graphics rendering performance at all, then that would be less work the CPU would have to do.
If it's a GPU that exposes the compute capability of an unmodified VLIW5 GPU, it's quite possibly a serious performance regression.
Cayman's VLIW4 architecture was the first one announced to have asynchronous dispatch, which is what gave the option for the GPU to run more than one kernel at a time and the ability for more than one CPU thread to send commands to its own compute kernel.
This seems like a fledgling or partially obscured implementation of what became the explicitly exposed ACEs in GCN.
Aside from the launch hype of the feature, I'm not sure as to how well it was exposed.
Running compute on an IP level older than Cayman would require a context switch of the GPU, basically wiping or writing back a good portion of the chip's context and reinitializing it in a compute mode for a little while, then doing another flush and reinitialization to graphics.
The latencies for that operation are brutal.
It's for similar reasons that the early GPU recomendations for Nvidia's GPU PhysX product were to have two separate boards, one for graphics and the other for physics. The hardware available at the time was not able to run multiple kernels, like VLIW5, and the system would spend so much time bringing up and tearing down device contexts that it was a giant performance negative.
This is on top of other documented problems on AMD's older hardware like the very poor cache subsystem, very bad VLIW compute code generation, and very rigid clause-based execution model.