I have implemented the algorithm in Triangle Scan Conversion using 2D Homogeneous Coordinates, in software, but the performance is quite dissapointing. I process 2x2 pixels at a time, using SSE assembly, but it's still almost twice as slow as a C implemenation of a regular scan-line converter. The problem is that it evaluates three half-space functions for all pixels in the bounding rectangle of a triangle. As illustrated in the article, that's often a lot more pixels than just the ones that have to be filled. So, I was wondering if anyone knows how this is handled in modern hardware. Do they use extra tricks to quickly determine which quads don't lie inside the triangle? Or do they just use raw processing power? Any ideas that might be of use to me? Thanks.