Nick said:
Let me guess: the sub-pixel accuracy and fill convention stuff?
Yeah, in fact up to now I just wrote a basic floating point rasterizer and didn't tested it much because I was still in the process of writing & testing the infrasctructure and I hadn't noticed the 'missed pixels' problem, so I'll probably switch entirely to your method. The only problem I have with it is that I have to tweak some of my code as a pure integer code might be slower than the actual one on my hardware (dual 1 GHz G4). Due to the possibility of issuing floating point vector operations together with integer ones and permutes I had a loop for the most basic situation (depth test on with any depth function, depth buffer and color buffer updates) with in/out polygon testing which rasterized two pixels every three cycles. I'm gonna miss it
. That's a good thing anyway since for all other platforms (older G4s, G3s, 970s and all x86 processors) it will be faster with pure integer math.
Doesn't that become very complex when there are many polygons and render states?
No, every polygon is binned with an header of 64 bytes + 12 bytes per each 'varying' + 8 bytes per each tiles it falls into which does not require much bandwidth unless you are throwing a lot of polygons inside the rasterizer in which case you will probably hit other limits. The state is kept very small, constant data is shared among every polygon which uses it and not duplicated.
It also seems like a lot of things are done multiple times. Like transformation, lighting, clipping, rasterization setup, etc. Only polygons that completely fall into one tile don't get clipping into multiple parts. But at low resolution this seems rare.
I don't do transformation as mine is a 'pure' rasterizer, I take already projected polygons, I will plug the geometry pipeline on it later. As far as clipping goes it happens only once per polygon during setup (which is also once per polygon), tile binning requires something like 4 instructions per tile which is very fast as most polygons fall into just 1 to 4 tiles. Uh, BTW I forgot to mention that setup, rasterization and shading can be spread over any number of parallel threads
. A very nice side-effect of tile-based methods is that they are inherently parallel.
I like the idea of an index buffer for the polygons! It immediately shows which polygons are visibile, without rasterization and z-tests.
Maybe I didn't explain it very well, the index buffer is generated by the rasterization process (including depth testing), after all the polygons have been processed I've got a buffer telling me which fragment belongs to which polygons which I use as an input for the shaders.
I considered this too one time, but when I computed the memory requirement it was gigantic. But when you do it in tiles it's probably very advantageous?
It is. It requires only a little bit of extra bandwidth (and probably from the caches as the tile buffers hardly exceed 128k usually) obviously per-pixel OIT is significantly slower than immediate mode or per-polygon OIT. Naturally I will optimize it later as it is not a priority.
I have some 'mixed feelings' about Kribi. It isn't capable of rendering a simple scene at decent framerate, but it's capable at rendering a seriously complex scene at several frames per second. So, it seems like he has a really powerful visibility determination algorithm, but it's too 'heavy' for simple scenes.
I don't know much about its internals though I chatted with Eric from times to times (you can spot him on Ace's Hardware usually) but he seems very skilled.