This was originally mentioned in one of those huge NV30 threads. I believe what was said was that having pixel pipelines capable of
rendering arbitrary pixels inside a triangle, or possibly even pixels from different triangles would 1.) cause texture cache efficiency to go down, 2.) require lots of extra logic over a K pipeline block set-up to render an NxM pixel grid.
Example - 4 pipelines set-up to render 2x2 tiles, or 3dlabs' p10, which looks like it's 64 texture coordinate processors and 64 integer pixel shader units are setup to render 8x8 blocks of pixels.
Well what happens to these units when you render really small triangles, ~1pixel, with complex shaders (i.e. if you are using your [V/G/?]PU to accelerate offline rendering)? It seems like you'd basically be wasting all but 1 pipeline for however many cycles it takes the shader to execute.
So I was thinking about how to allow pipelines to be a little more flexible. I'm no hardware designer, but it doesn't seem like it would be all that difficult :
The rasterizer still outputs NxM blocks of pixels - for each pixel covered by the triangle do Z and stencil testing. If the pixel passes store the pixel and it's associated shader inputs into a FIFO buffer (maybe in some sort of grid arrangement if possible). Beef up rasterization stage so that it can output more than 1 NxM pixel block per cycle. So long as different tris share the same pixel program, AFAICS it doesn't matter which tri these blocks come from - just insert the pixel program input into the buffer. Every time the K(=NxM) pipelines finished a batch of pixels, they would grab up to K new pixels from the buffer for processing. Seems like this would help pipeline utilization significantly...
It would also help if you went so far as to allow data-dependent loops/branches in the pixel shader stage - i think you would need an i-cache per pipeline, but you wouldn't have to wait for all pixels in the block you are processing to finish before starting on a fresh set of pixels.
Comments?
Regards,
Serge
P.S. I'm no hardware designer, I'd definitely be interested to hear about why this isn't a great idea...
rendering arbitrary pixels inside a triangle, or possibly even pixels from different triangles would 1.) cause texture cache efficiency to go down, 2.) require lots of extra logic over a K pipeline block set-up to render an NxM pixel grid.
Example - 4 pipelines set-up to render 2x2 tiles, or 3dlabs' p10, which looks like it's 64 texture coordinate processors and 64 integer pixel shader units are setup to render 8x8 blocks of pixels.
Well what happens to these units when you render really small triangles, ~1pixel, with complex shaders (i.e. if you are using your [V/G/?]PU to accelerate offline rendering)? It seems like you'd basically be wasting all but 1 pipeline for however many cycles it takes the shader to execute.
So I was thinking about how to allow pipelines to be a little more flexible. I'm no hardware designer, but it doesn't seem like it would be all that difficult :
The rasterizer still outputs NxM blocks of pixels - for each pixel covered by the triangle do Z and stencil testing. If the pixel passes store the pixel and it's associated shader inputs into a FIFO buffer (maybe in some sort of grid arrangement if possible). Beef up rasterization stage so that it can output more than 1 NxM pixel block per cycle. So long as different tris share the same pixel program, AFAICS it doesn't matter which tri these blocks come from - just insert the pixel program input into the buffer. Every time the K(=NxM) pipelines finished a batch of pixels, they would grab up to K new pixels from the buffer for processing. Seems like this would help pipeline utilization significantly...
It would also help if you went so far as to allow data-dependent loops/branches in the pixel shader stage - i think you would need an i-cache per pipeline, but you wouldn't have to wait for all pixels in the block you are processing to finish before starting on a fresh set of pixels.
Comments?
Regards,
Serge
P.S. I'm no hardware designer, I'd definitely be interested to hear about why this isn't a great idea...