gking said:RoBoBo,
Those extra fragments are all in the shader pipeline's FIFOs (first in, first out), in order to absorb things like memory latency. It takes a while for a read request into DRAM to get a response (page swap, charge, fetch, etc.). If the graphics chip did nothing while waiting for DRAM, performance would be aybsmal (thousands of pixels per second, instead of billions). So, the fragments go into a FIFO to wait for the texture read to complete, and the graphics chip processes other fragments.
You mean that when the pixel shader program for a fragment fails in reading a texture (it must go to video ram or even to the AGP) it stops with that fragment and starts a new one until the first one has the information available? If that was the case the overhead of storing the architectural state of the shader (temporary registers, output values) would be almost as big as the overhead of the interpolators, if no more.
And I think that what avoids those large stalls because of memory latency are the texture, z and color caches. If you are processing a single triangle at the same time you can almost garantee that all the pixels are going to stall if the first ones stall because they will hit more or less the same memory region.
Perhaps with fixed texturing and color that could be different.