Since you folks here are debating the possible efficiencies of the LRB sw rasterizer and or sw TBR, I've got a better question: assuming LRB manages to come with a sw rasterizer if not on the same level as the other ff rasterizers, at least damn close to it, what is it going to look like if ATI/NVIDIA have implemented more than 1 geometry unit on each core?
I just skimmed the article, but the recursive process to get the quad coverage of a triangle takes a number of recursive steps.
I haven't put much thought into what the cycle count would be, but this would probably take a fair number of cycles per triangle.
Of course, Larrabee's going to have a significant number of cores per chip and clock higher.
The multiple setup units on ATI/Nvidia hardware are an unknown quantity, and some of the patents show some interesting attempts at increasing throughput.
The setup units may also be charged with doing more than just coverage calculations, so the two schemes are not completely equivalent.
I'd suppose the high-end GPUs will still have peak rates in excess of Larrabee's.
The downside, as the article states, is that in times where setup rate exceeds other bottlenecks, those units will be twiddling their thumbs, while Larrabee's cores can just switch to other workloads.
Given the current fraction of die space given over to the setup pipeline, that could be something like 5% of the die.
If I had a *shrug* smiley, I'd use it.
The more interesting thing is the other bells and wistles a software approach might offer, like the reduced cost of calculating MSAA that was posited.