Good point. I'd still like to know how the scan conversion process is actually implemented in current hardware though!!
Just find the right patents
What you found is basically an hierarchical rasteriser as far as I can tell. There's other similar stuff here:
Accellerated start tile search
Tile based precision rasterization in a graphics pipeline
These seem to be related specifically to screen-space tiling, which is, implicitly, an hierarchical rasterisation problem (first work out which tile(s) the triangle covers and then break rasterisation down tile by tile).
I'm not sure how much detail you want on the process of scan conversion. As I alluded already, early-Z rejection requires a rasterisation which is likely to be separate from the rasterisation used to generate fragments. This is alluded to in the abstract here:
Rendering pipeline
which I haven't read.
Here's some other patent documents:
System, method and computer program product for geometrically transforming geometric objects
which is nice, lots of pix. This:
Method and system for a general instruction raster stage that generates programmable pixel packets
might be for handheld devices? The basic problem with this stuff is that we're forced to assemble this functionality from fragments of functionality scattered across multiple patent documents.
Some ATI stuff:
Optimized primitive filler
Optimal initial rasterization starting point
which look ancient and prolly have been optimised since. This is ATI's hierarchical tiler:
Method and apparatus for rasterizer interpolation
This seems to be rasterisation:
Rendering polygons
but it's ancient.
This is a nice overview of a real graphics pipeline:
http://ati.amd.com/products/radeonx800/RadeonX800ArchitectureWhitePaper.pdf
I was thinking more in terms of tesselation and/or deferred shading where there's a heavy geometry workload up front. At some point there will be enough small triangles that each doesn't produce that many pixels. So the triangle throughput would have to increase to keep the shaders and ROPs fed....
Yep, which is why you see occasional moans about low setup rates, as that's more of a constraint than rasterisation it seems. NVidia GPUs seemed to have a half-triangle per clock setup rate for a long time.
Again, Larrabee for the win (eventually...). The only bits of Larrabee that are going to be fixed bottlenecks are the memory and texturing systems. Everything else is open-ended based purely on workload. That's not to say it can't be wasteful (e.g. it's programmed to construct batches of fragments that are a minimum of 16 in size, but the program only packs a maximum of 4 small triangles into the batch)
When people refer to a GPU being capable of setting up 1 triangle per clock what exactly are they referring to? I can't figure out how a GPU can rasterize an arbitrarily sized triangle in a single clock cycle.
Triangle setup is essentially working out which vertices form which triangles. That proceeds, nominally, at 1 triangle per clock. A lot of the time you get one triangle per vertex coming out of the vertex shading pipeline (e.g. triangles from a triangle strip). Other times setup might be waiting for multiple vertices to be shaded to make a single triangle.
Once the setup engine has made the triangle it then commences rasterisation - but as I described earlier, it only needs to rasterise at a given rate, e.g. 16 or 32 fragments per clock. So, it doesn't matter if the triangle covers 16000 screen space pixels - it'll take its own sweet time.
Obviously the opposite problem that you've alluded to is the small triangles, particularly the little fuckers that don't even cover a whole pixel (but might still need to be rendered). This is where things get woolly as to how ATI and NVidia tackle these - particularly as it's a pixel shader batch-packing problem too. I think one of the NVidia patent documents I whistled past talks about this - but to be honest I'm not madly keen on trying to decipher all this.
Jawed