east of eastside
Newcomer
Some one else further expanding a similar integrated ARM GPGPU concept:
http://codingrelic.geekhold.com/2010/07/wwmd.html
http://codingrelic.geekhold.com/2010/07/wwmd.html
I'll speculate they will tightly couple the GPU, allowing very low latency access to it as an ARM coprocessor in addition to the more straightforward memory mapped device. This is not unique: some of the on-chip XScale functional units can be accessed both as coprocessors for low latency and as memory mapped registers to get to the complete functionality of the unit. Having very low latency access to the GPU would allow efficient offloading of even small chunks of processing to GPU threads.
One possibility is to let the GPU directly access the ARM processor cache and registers. This would allow GPU offloading to work almost exactly like a function call, putting arguments into registers or onto the stack with a coprocessor instruction to dispatch the GPU. When the GPU finishes, the ARM returns from the function call. For operations where the GPU is dramatically better suited, the ARM CPU would spend less time stalled than it would take to compute the result itself.
Or if it had numerous cores? Sounds very similar to the MSNerd six core ARM concept with cores dedicated to GPU acceleration of physics and AI.If the ARM CPU supported hardware threads, it could switch to a different register file and run some other task while the GPU is crunching.