Let me sum up what appears to be a general concensus, based on data pieced together from several sources around the 'net, and this thread:
1) GeForceFX can only write 4 "color values" per clock. Irrespective of whether z or stencil values are being read/written as well.
2) GeForceFx can write 8 z/stencil values per clock, when no color values are being written. (BTW, is this 16bit? 32bit? floating point z)
3) GeForceFX can calculate 4, fp32 bit shader ops per clock. (Though it might actually be 8 per clock, with bandwidth limitations reducing this to effectively 4).
4) GeForceFX can calculate 8, fp16 bit shader ops per clock.
I must say, if all this info ends up being more or less correct, this is all quite disturbing. :? For nVidia to claim on their spec sheets "8 pixels/clock rendering pipeline" would be a travesty.
Most people don't know what 8x1 is anyway, and those who do, know enough to check the benchmarks...Compared to GF4 Ti4800SE, this is really small potatoes, IMO.
Yes and no. As you can see from pretty much ALL of the initial reviews (and even from Carmack's .plan), much of the "unexpected" performance shortcomings of the FX are being blamed on "bad drivers", when the performance is actually readily explained by the hardware implementation. So the consumers / readers are given the impression that the FX is severely underperforming to its spec. When in reality, its performing much closer to spec than expected.
Everyone is blaming "bad drivers", because everyone is EXPECTING that FX can output 8 "real pixels" (color, not just z) per clock. I wonder where they got that idea.....