That's one way to start a religion.That madman has finally revealed his motives for his blog, and he is now suckering his followers into donating money into his paypal account.
The GCP is responsible for a number of contexts to run, when the context hits a snag and can no longer process instructions instead of stalling, it will switch over to the next context.
The command processors generally receive a queue command, attempt to arbitrate sufficient resources to launch a wavefront, and then let those wavefronts do as they will. Typical stall situations at a shader level are handled by the CUs, whose contexts run independently once initialized and launched.
Front end engines try to race ahead and process what they can from their queues, up until they cannot arbitrate for the resources necessary or are waiting on synchronization.
There are GPUs that run with DDR3 already. It doesn't bother them much. The CUs see the ESRAM as being roughly half the latency of DDR3, which is a significant relative improvement but still slow in CPU terms.Xbox pulls memory from two pools, esram and DDR3. One has amazing latency, the other is ok, one has superior bandwidth the other doesn't.
GPU hardware is pretty coarse in the granularity of work items it can run without wasting hardware. Small batches are an issue with current hardware. Draw calls that mess with graphics state enough can make the GPU stall to handle this, which may have more to do with the hardware contexts than memory latency.I think this is pretty high level that needs to be verified, but I think at this point this is where the rabbit hole goes deeper: the reason why you may actually prefer smaller draw calls, and more of them.
Primitives are evaluated and transformed into pixel coverage, and depending on what you're doing that takes some number of wavefronts as well. Each pixel, or rather quad of pixels, will need to go into a wavefront, which is 64 pixels in size. The number of wavefronts comes down to how many it takes to contain all the quads. The GPU culls or rejects things at multiple points, so how much gets handed off or may be discarded is variable.My understanding here isn't fully clear, since i'm not sure what happens at the SIMD level if I render 100K boxes with the same shader.
Without knowing what is being done for each box, how much is culled, and how much they cover, it would be unknown.Does it do the quoted above? I'm not sure how many wavefronts are submitted.
edit: spelling
Last edited: