GPU Rasterization: Triangle Setup Rate Limitations

Acert93

Artist formerly known as Acert93
Legend
The R800 thread had me thinking about the triangle setup limit we are seeing on Rasterization based GPUs. It seems GPUs have been limited to one vertex setup every clock (or every 2 clocks) for quite a while. A lot of people expected R800 to break past this limitation.

What is holding GPUs back architecturally from setting up multiple triangles per clock?

Is it a worthwhile investment? How often are GPUs setup limited? (Sounds like a bursty operation to me). With tesselation and displacement maps being pushed by some it sounds like relaxing this constraint could be a boom to performance.

It sounds like Larrabee won't have this sort of hard limit. Will AMD and NV follow, and if so in what ways can they resolve this issue?

Any arguements against increasing the triangle setup rate?
 
In order to increase the triangle rate, triangles would need to be processed in parallel, say for example 16 triangles at a time. Obviously this is something that can not be cheaply realized with current architectures.
Parallel triangle setup and rasterization (determining the triangle pixels) should be easy enough. Parallel early Z rejection already would be less cheap, but with multiple read ports to the Z cache should be feasible. Dispatching triangle fragments ie 4x4 pixel stamps to multiple shader processor (typically 16 to 32) should not be too hard either. Writing the fragments correctly to the frame buffer might be more problematic as they could overlap, which makes blending and Z buffering non trivial.
With really small triangles like pixel or sub pixel, this whole process would still fall apart and a radical different architecture seems in order...
 
what about double clocking the serial triangle setup design? they already have different clock domains all over the place.
 
Unfortunately the tools GLSL doesn't compile on my Radeon 4870, so I can't test and I'm to lazy to fix it myself.
 
Back
Top