Culling is one of the mesh shaders biggest use case. The point of using mesh shaders for culling is to breakdown the geometry into these tiny little 'meshlets' to do finer grain culling. AMD's primitive shaders are very much capable of breaking down geometry into meshlets.
The headline motivation for mesh shaders is to break a serial bottleneck at the primitive setup stage of the geometry pipeline. The front end becomes more like a compute shader that can define the primitives it takes, how they are arranged, and how they can map to wavefront lanes or invocations further down the pipeline. Some variations on the task/amplification stage allow for more proactive control over how many primitives may be used (task shader LOD selection) or are generated (task/amplification shader sets number of mesh shaders).
Nvidia may have had the least to say for culling because at that time its hardware capability for culling had grown substantially enough that there could be performance costs to doing all the culling prior to reaching the hardware.
Primitive shaders, as described, take the full stream of primitives from input assembly or the tesselation stage, and then try to cull from it and try to best fit the number of wavefronts and active lanes automatically.
A 'meshlet' on AMD hardware consists of 254 verts/128 prims per wave and the primitive topology can very much be programmer defined too.
Is there a link to AMD's pages describing how to define primitive shader meshlets for game programmers?
Also the reason why vertex/geometry/tessellation shaders don't work in D3D's mesh shader pipeline is mostly down to a hardware limitation from Nvidia (possibly Intel as well) so Microsoft has to make this compromise otherwise mesh shaders in D3D would have no way of working on their hardware.
The apparent direction Microsoft is taking is that it's not going to want to fit mesh shaders into that pipeline. Vertex shaders are already amenable to conversion to either primitive or mesh shaders. There's no desire to continue using the tessellation stage, and general disinterest in geometry shaders going forward. AMD's primitive shaders, as they are known, attempt to cater to them.
From what we've heard of primitive shaders with Vega, they were either easy for compilers to generate or profoundly difficult to expose to programmers.
There are some hints in the Vega ISA document about how low-level it might have been, with an instruction designed to give the shader engine or engines responsible for a given primitive's bounding box, likely to determine what low-level state machines needed to be broadcast a triangle or which ones needed to be told to drop a primitive from their FIFOs, as there were message types that could cull primitives from shader engines. There are driver changes and warnings about other operations needing to be very careful about addressing the shader engines or their queues, given bugs or the tendency to hard lock the GPU when working with them.
Even the culling shader version ran into problems, possibly because much of their work was done by compute shaders, and potentially due to additional latency or occupancy issues in the more serial setup stages making performance gains limited or unreliable. Navi's focus on latency and the dual-CU arrangement may have made the more modest level of culling in current auto-generated primitive shaders feasible.
(edit: Clarification on the work being done by compute shaders: as in by the time primitive shaders were introduced, developers were already using compute shaders that did most of what they did.)