Figure it's a good time to start some architecture discussion again.
In the leaked Hexus benchmarks for Heaven 2.0, we see that changing the tesselation level imparts a performance hit. I believe that for the most part this is due to increased load on triangle setup (including clipping and culling) because additional hull/domain/vertex shader load should be minimal, and the usual clumping of triangles will prevent the GPU form hiding this bottleneck behind pixel procesing. So let's do a little analysis:
No tesselation, normal tesselation:
HD5870: 40.5 fps, 26.3 fps ==> 13.3 ms extra processing time
GTX480: 45.9 fps, 36.9 fps ==> 5.3 ms
Fermi crunches through this additional load 2.5 times faster.
Normal tesselation, extreme tesselation:
HD5870: 26.3 fps, 17.0 fps ==> 20.8 ms
GTX480: 36.9 fps, 29.5 fps ==> 6.8 ms
Fermi crunches through this larger additional load 3.1 times faster.
We know Cypress can do one triangle per clock, and this is what NVidia has said about Fermi:
http://www.bjorn3d.com/read.php?cID=1778&pageID=8321
http://www.techreport.com/articles.x/18332/2
Not quite the expected result, given that Cypress is clocked faster, but Cypress is probably a little below 1 tri/clk on average, so close enough :smile: