NV40 Pixel Cluster Dependency

It has been previously indicated that NV40 assigns a quad to a one of its four pipeline clusters, dynamically, as each cluster becomes available for processsing (source). However, does this imply that NV40's 4 pixel processing clusters are independent of each other instruction-wise?

More specifically, can each pixel cluster work on its owh pixel shader instruction and data irrespective of the other clusters in NV40?
 
I'm fairly sure that each pixel quad has to run the same fragment program. I guess that means the same instruction per clock.

Anyone know otherwise?
 
Rys said:
I'm fairly sure that each pixel quad has to run the same fragment program. I guess that means the same instruction per clock.

Anyone know otherwise?

I don't think there is any good reason to limit all the quad fragment shaders to execute the same instruction from the fragment shader program at a given cycle. Unless, of course, they weren't different processing units but a single processing unit. Shader programs aren't (usually) that large and a small instruction cache of some kind per shader unit wouldn't represent many transistors.

Obviously all fragment (or vertex) shader units run the same fragment (or vertex) program. Changing the shader program is a state change and happens between batches and because complexity reasons GPUs aren't likely to work on different batches at the same time (at least at the same pipeline stage, you may divide the pipeline in geometry and fragment, for example, and run a batch, two pipelined, at each phase).
 
Luminescent said:
It has been previously indicated that NV40 assigns a quad to a one of its four pipeline clusters, dynamically, as each cluster becomes available for processsing (source). However, does this imply that NV40's 4 pixel processing clusters are independent of each other instruction-wise?

More specifically, can each pixel cluster work on its owh pixel shader instruction and data irrespective of the other clusters in NV40?
I think the performance characteristics of dynamic branches indicate that there's only a single instruction stream for all quads at any given time.
 
I believe each quad pipeline works independently, if not for performance, then for scalability reasons. C&P design. ;)
Of course there could also be a single decoder frontend for all pipelines, but maybe that is not enough redundancy, or maybe it's more difficult getting flow control that way. Also, that could mean batches would have to grow with more pipelines, so each pipeline still gets the same amount of quads.

Overall, considering just NV40 and how it works on quad batches, it doesn't matter much. I mean, NV40 is said to work on batches of ~1000 pixels, ~250 quads. They don't have to be the same triangle, just the same shader. Whether all pipelines work on 64 quads, 1/4 of a batch, or each has its own batch to process, it takes the same time. But output synchronisation might be an issue with different batches.
 
Back
Top