I feel being able to do 64 Z-Writes/Reads per clock, when working in 4x MSAA, would be a very worthwhile venture once you've got GDDR4. That is, if geometry isn't your bottleneck, although with an unified architecture, it just isn't going to be (triangle setup could be, though!) - IMO, it's unlikely to happen though, but still some rather nice improvements in such situations would be nice.
Personally, I'm more than tired of Z-Passes being a win for half the cards within a SAME chip family and a loss with the other half (some NV4x really aren't that impressive with Z-Only IIRC). I hope it isn't too much to ask for that to have semi-coherent performance characteristics for a single architecture family...
Uttar