Advanced Rasterization

Scali said:
You mean your index-buffer maps 1:1 to the pixelbuffer/zbuffer/etc?

Yes.

So you can theoretically have a different shader for each pixel?

Indeed, once rasterization has ended I scan the index buffer and group the fragments by polygon / constants (uniforms, textures) and shader, then generate the x/y coords of the fragments and throw them into the rasterizers together with everything else. I incur in an almost fixed cost for this operation and it allows me to keep the shader structure very simple, the shaders see a stream of fragments coming in so it's easy to vectorize them.

I've seen a few implementations that use a tree for each scanline, which stores the spans of triangles. This basically replaces the zbuffer altogether, and in most cases you'll have spans of (much?) more than 1 pixel on a scanline, so it's more efficient.

Been there done that ;). The first incarnations of my project used a span buffer later changed to an overly convoluted edge buffer with intersection detection and handling of transparent polygons. Unfortunately those systems do not scale very well for two reasons. They are hard to parallelize and they suffer a lot from branch misprediction (as they are very branch heavy) so they are not really suited for modern processors. On top of that spans are not the best type of input a shader could get, quads have better spatial locality.

But perhaps you chose for a per-pixel buffer because you expect that there will be too many triangles to make a tree-approach efficient?

Yeah, since the industry is quickly moving towards a lot of small polygons I prefer to optimize my system for those type of workloads. Probably there are better ways to handle simple scenes with big polygons but then again it should be fast enough to handle them anyway.
 
Nick said:
Doesn't that become very complex when there are many polygons and render states? It also seems like a lot of things are done multiple times. Like transformation, lighting, clipping, rasterization setup, etc.
Why on earth would you do them multiple times? (with the possible exception of rasterization setup (but that can be a space/time tradeoff).
 
Simon F said:
Why on earth would you do them multiple times? (with the possible exception of rasterization setup (but that can be a space/time tradeoff).
You have to render one tile at a time. So for every tile you have to transform, light, clip and rasterize. For polygons that span multiple tiles this will be done multiple times.

The only alternative I see is processing all vertices in the scene, and storing them in extra vertex buffers. But this wastes memory and most of all bandwidth. And isn't one of the primary goals of tile rendering to reduce bandwidth?

In these days of polygons being just a few pixels large I see little advantage... Or am I missing something?
 
Nick said:
You have to render one tile at a time. So for every tile you have to transform, light, clip and rasterize. For polygons that span multiple tiles this will be done multiple times.

No, the render phase happens when all the polygons have been sent and (at least in my implementation) have also gone thru setup. No information about the vertices is hold after the setup phase. The setup phase generates for every polygon the edge coefficients (for half-space calculations) and the varying variables gradients. Once this is done this data is stored into an header. After this is done for each tile to which the primitives belong a pointer to this header is added. When all the geometry has been sent (after a glFlush()/glFinish() or a page flip) rasterization and shading take place using the primitive headers which do not require any extra setup.

The only alternative I see is processing all vertices in the scene, and storing them in extra vertex buffers. But this wastes memory and most of all bandwidth. And isn't one of the primary goals of tile rendering to reduce bandwidth?

You don't need to store vertices, only polygon information is relevant to the rasterization and shading process. At least in my implementation. It's basically an immediate-mode setuper and deferred rasterizer/shader ;)
 
crystall said:
You don't need to store vertices, only polygon information is relevant to the rasterization and shading process. At least in my implementation. It's basically an immediate-mode setuper and deferred rasterizer/shader ;)
So... you resolve visibility information as soon as possible, only storing the minimum information needed for visible polygons per tile? I think it starts getting through to me. Cool approach!
 
You are overestimating the amount of vertex data you have to store post-transform ... he just stores the lot, except possibly what is culled by viewport culling, and it is no problem.
 
Nick said:
So... you resolve visibility information as soon as possible, only storing the minimum information needed for visible polygons per tile? I think it starts getting through to me. Cool approach!

I hope to have it ready soon so you will see the code by yourself :) (boy I hate having so many exams still left... :( ).
 
Nick said:
Simon F said:
Why on earth would you do them multiple times? (with the possible exception of rasterization setup (but that can be a space/time tradeoff).
You have to render one tile at a time. So for every tile you have to transform, light, clip and rasterize. For polygons that span multiple tiles this will be done multiple times.
But that would be just plain stupid.
 
Simon F said:
But that would be just plain stupid.
Then show me the plain simple approach. :p Seriously, it's not all that trivial. There has to be a balance between memory usage, bandwidth, processing power and state management complexity.
 
Nick said:
Simon F said:
But that would be just plain stupid.
Then show me the plain simple approach. :p Seriously, it's not all that trivial. There has to be a balance between memory usage, bandwidth, processing power and state management complexity.

My quick non-specialist solution:
Process Vertices up to fragment shader, doing triangle/tile sorting just before it.
Information required is just the info required for fragment shaders, so might/likely to be small. (at least not big.)
Then process tile by tile, saving fragment shader processing.

I just did remember that paper, which might be of interest.

[edit]
The OpenGL pipeline Diagram.
Given that, PowerVR would compute everything up to rasterization, which "stage" would be modified to sort triangles/tiles, then tiles would be processed one by one, triangles sorted by fragment shader and Textures.
(something like that)
[/edit]
 
What's the big deal? You transform everything, do some setup if you want, and store the results ... and a pointer to it in the affected tile bins.

Once we get pixel sized tris and want to do higher order occlusion culling during rasterization you can get more tricky ... but as long as the tris are big and the overdraw is limited storage and bandwith with this scheme isnt a problem.
 
Back
Top