I don't really trust that. There's a big difference between 9.4fps w/ 12M triangles (bson screenshot) and 9.4fps w/ 4M triangles. The latter would imply 18 clocks per tri, which is too large of a hit.The text says that with LOD=100 there are 28mln triangles, while at LOD=25 there are 4mln tris
480@100 is as fast as 5870@25
No, that when it's >95% bound by TS. At lower levels it's just not quite as bound, but still primarily so. It has to do the compute shader first, and can't get to the few large triangles while it's stuck on the small ones, so everything is serial.HD5970 in this does not reach its theoretical TS/setup-rate performance margin over HD5870 (71%) until LOD 100.
So Cypress isn't TS/setup bound until the highest LOD.
You're assuming that there's no crossfire overhead. I don't know why you're comparing to the 5970 anyway. Even if crossfire overhead was zero, it renders 71% faster, too, so all workloads would be 71% faster.The PN Triangles sample, tested further up the page, reaches HD5970's theoretical margin only on the final factor of 19.
So at lower tessellation factors HD5970's performance is being limited by something other than TS/setup. Not sure what's going on there.
Same flawed logic.The detail tessellation test further up the page is only about 54% faster on HD5970, so again falling short of being TS/setup-rate dominated.
Because GF100 increases in performance, the primitive count must be going down when tessellation is enabled. Cypress, however, process tesselated prims more slowly, so it takes a performance hit even with the reduced prim count.The NVidia sample, geometry/compute with the hair shows no real variation in performance with tessellation on/off on Cypress but shows 14% more performance on GF100. No idea how tessellation is being used here.
Maximum for me was 274M prims/sec.
Sorry, I misread your post earlier. The 9.4 fps from ixbt was for LOD=50, and your 4m tri number was for LOD=25, so there's no conflict with bson.The text says that with LOD=100 there are 28mln triangles, while at LOD=25 there are 4mln tris
480@100 is as fast as 5870@25
FYI, the reason your prims/sec drops is the compute shader starts to take a larger portion of frame time when the FPS goes up at lower tessellation factors.From the Nvidia Island Demo: It's actually a bit below maximum tessellation factor, that you can reach the 1/3-per-clock ratio for Cypress. Maximum for me was 274M prims/sec. On those settings i can go down to a tessellation factor of 43 (avg. exp. ratio 3,613 vs. 7,756 at factor 63) and still get 270M prims/sec on my 5870.
At first I was a bit surprised by your number, because I'm calculating 1 prim every 6 clocks, but it looks like the reason is dynamic tesselation is enabled in bson and ixbt's benchmarks, but you have it unticked. This would also explain my Unigine calculations from B3D's numbers and also jives with Damien's test. Unigine doesn't use dynamic tessellation and I got 3 cycles/tri, and Damien said he got 278Mtri/s max and less when you needed more data per vertex.Settings where: Fullscreen, 640x480, all checkboxes unticked except for Query Pipeline Statistics.
So it looks like the numbers are legit. ATI's tessellator is either faster at integer tesselation or the bottleneck is somewhere else, like feeding the domain shader.Don't know what settings Xbit ran the sample at, but with default, only ticking "Tessellation", I'm getting
2746 (sic!) -1096 - 199 - 37,5 Fps
on my HD 5870 with Cat 10.3 and the sample from Microsofts DX SDK (Feb 2010).
Why do you think Unigine doesn't use dynamic tessellation?Unigine doesn't use dynamic tessellation
Sorry, I used the wrong word. It uses the same tessellation factor for all triangles in a model. There's no fractional tessellation, and no triangles with different tessellation factors on each edge.Why do you think Unigine doesn't use dynamic tessellation?
I've seen tessellation change with distance.That being said, does Unigine change the tessellation factor for an object as distance changes?
From the Nvidia Island Demo: It's actually a bit below maximum tessellation factor, that you can reach the 1/3-per-clock ratio for Cypress. Maximum for me was 274M prims/sec. On those settings i can go down to a tessellation factor of 43 (avg. exp. ratio 3,613 vs. 7,756 at factor 63) and still get 270M prims/sec on my 5870.
Settings where: Fullscreen, 640x480, all checkboxes unticked except for Query Pipeline Statistics.
Or it could be something as simple as vectorizing your code when possible. You wouldn't use x87 when SSE was appropriate on a CPU.
GTX 480 at 1920x1200 achieves almost the same primitve rate: Just a tad under 1,300M and when changing views - actually looking down at the rocks, that is - it gets up to 1,600M again.Dude, clean up your inglish Carsten, what's da matta wit u?
Seriously though, good data, do you have any of this data at resolutions higher than 640x480?
It should, but it's often an inexact science because the HS has to predict whether the DS will make triangles visible even when the patch is off-screen or backfacing.Culling as much as possible in the HS should be beneficial for all hardware. Hopefully developers will do this.