At least strips seem to be much faster. I would assume that strip cut indices have a fast path now. Also degenerate triangles (common in non-indexed strips) should be much faster now (degenerate = zero area -> culled by primitive discard accelerator).There is not any third-party benchmark could proof the “aggressive primitive culling” that AMD said introduced in Polaris so far.
http://techreport.com/review/30328/amd-radeon-rx-480-graphics-card-reviewed/5
I remember some polygon throughput benchmark test results (few years back) where Nvidia was beating AMD badly in cases where most of the triangles were zero area or hidden. Unfortunately I can't find the results anymore and I don't remember the name of the benchmark.
Yes the rasterizer is hidden, but can be a performance bottleneck, especially when there's lots of small triangles hitting the same area. Nvidia's tiled rasterizer is an elegant way to combat this issue. Maxwell was a big efficiency improvement for Nvidia. I would prefer more efficient rasterizer, but more ROPs wouldn't hurt either. Fury X had much higher bandwidth and more CUs than high end Radeon 300 series cards, but identical number of ROPs. Compute performance was not a problem for Fury X, filling the compute units with work was the real problem. Fixed function bottlenecks need to be solved in order to utilize the compute units better. Both the geometry pipeline and the rasterizer are still behind Nvidia.So what exactly do you want to see changed? Just more ROPs?
Isn't rasterization something that's entirely hidden?
DCC in Tonga (and Polaris DCC improvements) is a nice boost for GPU utilization. GCN 1.0/1.1 had to perform several decompression steps (fast clear elimination, depth decompress, MSAA decompress, etc) during the frame. New AMD cards can directly read DCC data. Decompress steps are awkward, since you need to wait for GPU idle twice (rasterizer work finished, decompress finished) before you can start reading the texture.