The answer's bleeding obvious: Evergreen is a significantly feature-/performance-reduced chip from what was originally planned. AMD kludged the whole thing. That's why it's so inefficient (not just in tessellation, but pretty much everything). See Barts for a pointer on the inefficiences.
As to whether Cayman will be a minor or major improvement (on everything, not just tessellation), well, I guess my first sentence hangs in the balance Cayman should be more than Cypress was ever planned to be (1 year of extra development), which will distort comparisons, but I don't for one second believe that Evergreen's tessellation architecture is as it was originally planned. It's just horrible.
As to why Cypress wasn't Barts specification, who knows. AMD could have made much more effective use of 40nm - what turned out to be very limited supply of 40nm - for a negligible performance loss. Seems like another indication that Cypress was a hatchet job.
Barts is a year newer. That means one more year of experience with TSMC's 40nm process, one more year of tweaking, improving, etc. It is far from certain that AMD would have been able to release a 151W Barts XT in Q3'2009.