Unified Pipeline Architecture

trinibwoy

Meh
Legend
Supporter
Thought I would start a new thread on this since there has been some discussion elsewhere but no real focus.

Anyone with info or insight into how a unified pixel/vertex architecture is going to shape up in future gpu's please chime in for the edification of myself and others who are interested.

Some of my own questions -

0. What are the major limitations / bottlenecks of the current pixel/vertex shader implementation.
1. How are they going to balance the pixel / vertex workloads
2. What are the implications for register usage
3. What is the impact on transistor count for each pipeline?
4. When can we expect the first unified architecture part from the major IHV's?
 
I think dxnext is whats pushing unified shaders , it should allow developers to sacrafice performance in one to gain performance in the other. Hardware wise it will defenatly cost more transitors than current solutions. The biggest benifit should be in the way shaders are written and interpeted by hardware. A new shading language could possably be released along with this feature.
 
flick556 said:
I think dxnext is whats pushing unified shaders , it should allow developers to sacrafice performance in one to gain performance in the other. Hardware wise it will defenatly cost more transitors than current solutions. The biggest benifit should be in the way shaders are written and interpeted by hardware. A new shading language could possably be released along with this feature.

You won't need unified shaders for DX Next, you will have to support all of the same operations in both vertex shaders and Pixel Shaders, but that in itself doesn't require a unified model.

The argument against a unified model is that despite the fact that you do basically the same things to both vertices and pixels, the mix of instructions that is executed differs significantly.

Personally it will happen, probably very soon, it's just a question of when the extra performance outweighs the cost of the extra logic.
 
Now the main differences betweeb vertex and fragment processing are:
1. vertex shader doesn't normally read texture although VS3.0 spec gives the ability.
2. branching is somewhat more efficient in vertex shader, especially dynamic branching.
3. fragments need to be grouped into quad before sent into fragment shading pipeline whereas vertices don't need to.

To unify the pipeline, we need to consider:
1. If it's worth the increased delay penalty by putting both vertex and fragment in the same process logic. I don't think vertex shader will ever need that frequent texture read as fragment shader does.
2. How to deal with the branching in fragment shader, if we want a unified architecture. Personally, I think IHVs will still advise using dynamic branching SELECTIVELY in the near future, because it's difficult to improve branching efficiency in fragment shader.
3. Finding a better pipeline management scheme: free vertex input v.s. grouped fragment input.
 
991060 said:
1. If it's worth the increased delay penalty by putting both vertex and fragment in the same process logic. I don't think vertex shader will ever need that frequent texture read as fragment shader does.
This might be avoided by implementing a priority queue. If you consider that the added latency for texture reads consists of a queue at some point that stalls processing for some number of cycles, then you could reduce the latency of some operations simply by giving them higher priority in the queue.

It doesn't necessarily even need to be tied to vertex or pixel data: the hardware could simply promote any processing that doesn't include a texture read (I would expect the compiler to group all instructions that can be executed in one clock into one execution block). The problem, of course, would be managing this efficiently. You'd need to have a very fast insert algorithm, and a simple method of determining where in the queue the next execution block should go.

Beyond improving the unification of vertex and pixel pipelines, it could significantly improve the performance impact of branching in many situations.

3. Finding a better pipeline management scheme: free vertex input v.s. grouped fragment input.
It may become useful to start processing quads in serial instead of in parallel.
 
Back
Top