With no patch control point shader and with no vertex shader GCN 1.1 tessellation performs quite well (HS body only + DS that takes SV_PrimitiveId and SV_Barycenrics as input). Primitive rate becomes the bottleneck. Tiny triangles also cause various bottlenecks with the rasterization pipeline (Polaris improved this).There are some misunderstandings here. No GCN part has required the DS to execute on the same CU as the HS, though it sometimes does.
It is recording and reordering triangles to improve screen locality. The synergy with tile based compression (DCC & lossless depth compression) is clearly there. Screen locality also (trivially) improves render target cache hit rate. I am not expecting it to behave like PowerVR TBDR (= no overdraw), but it should be easily able to save 20%+ of render target bandwidth in common cases. Nvidia could also use slightly more complex DCC algorithms, as tiling should hide the DCC latency better and invoke DCC hardware less often. This gives further bandwidth gains.I suspect people give far too much credit to Nvidia's tiled rendering though without a benchmark that can disable this feature there's no way to prove anything. Low voltage and high clock speeds are the primary weapons of Maxwell and Pascal.
One case where the Nvidia tiling really helps is particle rendering (rgba16f output). Particles are most often 2 triangle quads. Nvidia can bin thousands of particles to tiles before rasterizing them. Particles close to each other spatially (from the same emitter) are likely also close in the triangle list, meaning that they get binned together. Particle effects (big explosion) close to the camera are the number one reason for big frame dips in games. One explosion is < 1000 particles = gets binned at once. So instead of hammering the memory bandwidth (read + write) with 100x full screen rgba16f overdraw (of the nearby explosion smoke particles), we get a single read + a single write. This is a huge saving.
Good example of potential gains (this technique blends particles in LDS):
http://www.slideshare.net/DevCentra...ndering-using-direct-compute-by-gareth-thomas
Last edited: