Thanks a lot for the info -- this is something where my knowledge of the hardware side is way below my knowledge of the software side so bear with me if this is a stupid question: Automatically converting standard fixed function shaders (vertex, etc) into something that the hardware is better designed for (more compute, etc) is one thing, but where would culling get introduced here? Does the hardware have some way to know, or is something happening on the developer side? With Mesh Shaders, for example, theyre not necessarily faster than traditional fixed function geometry at all, at least for simple cases -- but because of how they're structured (Task shaders -- specialized compute shaders that dispatch Mesh Shaders, specialized compute shaders that take the place of Vertex Shaders) you can relatively simply introduce huge culling benefits that just aren't practical in a straightfoward way on the old pipeline. (There are some recent xbox developer youtube videos about this on the dx12 side)
If you look at where the Geometry Engine is and if you follow the old FF pipeline, you can see how many stages will pass until at the primitive assembler/rasterizer does back face and frustrum culling occurs.
You can see that advantage of discarding the triangles way up front as providing a cumulative benefit down the line to not have to work on triangles that don't need operations on. This can be monumental the more triangles are removed from view for instance. This is assuming developers are following a basic flow of course, probably inaccurate, but useful for the discussion of timing.
I'm sort of at the understanding that
a) even if you use the older pipeline, RDNA 2 is still biased towards triangle discard, so it discards a lot more triangles than it can raster, and this is something we know from AMD presentations and also something Cerny spoke to, as well as Matt H. But this would be non-compute based culling.
b) and I believe if developers decide to take advantage of primitive shaders to handle the culling up front, then you get that cumulative effect down the chain. I do not think it's possible for a driver to know what needs to be culled necessarily, but I could be wrong. but it can do everything else in terms of compiling the front end shaders into primitive shaders.