There is no easy way to stop using quads on graphic chips without suffering a major performance loss when rendering models that don't use flow control. But when we look beyond the current chips to DXNext and OGL2.0, we see something that looks a lot like a general purpose vector processor and not at all the fixed-but-programmable design that made the current SIMD, shaders-and-quads design such a hit.
The largest piece of screen real-estate of current and near-future visual is actually some kind of low-poly and/or simple shader model, that benefits tremendously from the quad pipeline design of the chips. And a quad pipe with two full ALU's generally doubles the throughput.
To go from here to a fully programmable model requires flow control, which seems to require single-pixel pipelines. But when we see that the next generation unifies the vertex and pixel shaders into general vector units, there is another possibility: use four ALU's per quad.
That gives tremendous troughput for any shader that uses simple models on large triangles, while it enables each pixel to run it's own program at reduced speed.
Looking at it from a hardware (transistors) perspective, ALU's are expensive. But so are all the buffers needed to keep the pipes optimal filled! Add to that the observation, that the best current design of a quad consists of two (full) ALU's and two mini-ALU's, and it is easy to see that we are close to the point where four full ALU's per pipe while reducing the buffers drastically becomes quite an interesting way to go, especially when unifying the vertex and pixel shaders.
What do you think?
The largest piece of screen real-estate of current and near-future visual is actually some kind of low-poly and/or simple shader model, that benefits tremendously from the quad pipeline design of the chips. And a quad pipe with two full ALU's generally doubles the throughput.
To go from here to a fully programmable model requires flow control, which seems to require single-pixel pipelines. But when we see that the next generation unifies the vertex and pixel shaders into general vector units, there is another possibility: use four ALU's per quad.
That gives tremendous troughput for any shader that uses simple models on large triangles, while it enables each pixel to run it's own program at reduced speed.
Looking at it from a hardware (transistors) perspective, ALU's are expensive. But so are all the buffers needed to keep the pipes optimal filled! Add to that the observation, that the best current design of a quad consists of two (full) ALU's and two mini-ALU's, and it is easy to see that we are close to the point where four full ALU's per pipe while reducing the buffers drastically becomes quite an interesting way to go, especially when unifying the vertex and pixel shaders.
What do you think?