How fat vertices has been used in the testing? Storing a full tangent space matrix to the input vertices, replicating it to all tessellated vertices and transforming/interpolating it in domain shader for each generated vertex could become a perf hit easily. As tangent frame calculation can be moved to the pixel shader (by using gradient instructions) and some games already are using this trick, it should be benefical to test the tessellation speed with as light input vertices as possible (pixel shader tangent frame calculation will be popular with tessellation).Yeah, I've seen triangle rates in the range of 770M also on the rather oldih Xvox-Demo (which btw, seems to scale exceedingly well) and that's of course without tessellation (it's DX8!).
edit: Even with AMDs own terrain tessellation (DX9) from the DirectX SDK, there's a maximum of ~580ish M triangles I've seen.
As to why tessellation on current AMD hardware is slower than expected has been discussed here already, but no one could come up with decisive evidence for either of the proposed theories.
Yes, it's kind of inexact for culling backface triangles, but very efficient and highly usable for culling offscreen triangles before the tessellation. As you have full control over DS and HS, you know exactly the maximum value that a single vertex can move by your displacement mapping. In our engine for example all displacement maps are 8 bit snorm maps (artists use full range from black to white), and there is a per object (float) scale factor for the displacement. In this case you can just add the displacement scale factor to your viewport culling test, and you get a perfect culling result (and a major performance boost).It should, but it's often an inexact science because the HS has to predict whether the DS will make triangles visible even when the patch is off-screen or backfacing.