I think for real world workloads it would be more than sufficient if AMD removed the 3 clocks per tessellated triangle bottleneck instead of adopting Nvidias approach with decoupled triangle setups. That seems to be quite difficult to do - both from an engineering standpoint (there AMD shouldn't have a big problem) but also with regard to die size and maybe power - which would be more crucial if AMD will pursue its current route.
What I don't understand from my own benchmarks though, is why the GF100s tessellators seem to be less efficient than the ones in the smaller chips. I don't believe that I already have run into the systems limit.