Extra tidbits:
* NVIDIA confirmed to us that the chip supports GDDR4, but they obviously wouldn't comment on unannounced products.
* Blending is still half-rate, unlike R6xx which is full-rate (more important now given the number of ROPs). While you'd be bandwidth limited anyway for FP16 blending, it is clear that you can be ROP-limited for INT8/FP10 blending; not a major bottleneck, but still noteworthy.
* Triangle setup is also completely unmodified, with 0.5 tri/clk setup, 1 tri/clk culling and only 1 vertex/clk output to the post-T&L FIFO (without attributes, even in Z-only passes). Every single one of those metrics is at least 50% lower than R600's, and this is most likely a very real bottleneck in certain cases.