http://www.beyond3d.com/content/reviews/55/10People need to stop making these simplistic calculations. Triangles come in clumps. When you have a bunch of small or hidden triangles, there are very few pixels drawn to the screen during that time. You basically have to deal with those triangles and draw the majority of the scene in the remaining time.
Here's a more detailed post I did on the matter:
http://forum.beyond3d.com/showpost.php?p=1383571&postcount=481
What I don't understand is why ATI can't do one tessellated triangle per clock. Cypress was taking 3-6 clocks per triangle in tessellation tests, and it looks like this generation is barely any better.
What on earth is going on inside that tessellation engine? It's inexcusable to be drawing equivalent, bandwidth-wasting, pretessellated DX10 geometry faster than tessellated DX11 geometry.
EDIT: Maybe ATI was selling itself short with that slide if this benchmark is correct...
There's a currently running meme about Cypress taking 3 clocks per tessellated triangle - this is incorrect in an absolute sense, albeit we can generate that scenario quite easily, as we can do a bit better than what you're seeing (note we've reached up to ~600 MTris/s by using triangular patches, thus trimming down the per control point data, all else being equal).
ATI's problem is primarily one of data-flow: they try to keep some data in shared memory (as far as we can see, they try to keep HS-DS pairs resident on the same SIMD, with hull shaders being significantly more expensive then domain shaders) but data to and from the tessellator needs to go through the GDS. There's also the need to serialise access to the tessellator, since it's an unique resource, coupled with a final aspect we'll deal with when looking at math throughput.
Given all this, fatter control points (our control points are as skinny as possible) or heavy math in the HS (there's no explicit math in ours, but there's some implicit tessellation factor massaging and addressing math) hurt Cypress comparatively more than they hurt Slimer - and now you know how the 3 clocks per triangle scenarios come into being, a combination of the two aforementioned factors.