The feature is called "Primitive Shader", implying that it is fully programmable shader based culling system. It is not fixed function hardware. If I understood properly, the developer needs to write these primitive shaders to improve the culling rate. So far no graphics API exposes this feature. There's also discussion about the possibility of AMD autogenerating these primitive shaders in the future drivers. Without full spec of the system, I can't really say whether this is possible in the general case, and how hard problem it is to solve.
GPU culling was originally designed to avoid GCN2 hardware bottlenecks (consoles). GCN3, GCN4 and GCN5 all reduced geometry related bottlenecks, making techniques like these slightly less useful. There's still lots of GCN1/GCN2 cards around. For example R7 360, R9 390 and 390X were GCN2 based. Only 380, 380X and Fury used GCN3. Also in 400 series, everything below R7 460 is based on GCN2 and GCN1.
GCNs geometry bottleneck is mostly visible when you have high triangle per pixel density. Consoles render usually at 900p or 1080p. This is 2x-3x less pixels than 1440p. GPU executes significantly less pixel shader instances on average per each triangle at 900p vs 1440p. This results in significantly worse GPU utilization when geometry is the bottleneck. Result = culling gives a significant advantage.
GPU culling is still a very good technique, but the biggest impact can be seen on GCN1/GCN2 hardware and/or at lower rendering resolution. These benchmarks are 1440p on GCN5. Apparently the game doesn't have enough triangle density or depth complexity (occlusion) to see benefit of GPU culling in this scenario. We don't know exactly what their algorithm is doing. I am assuming their algorithm is similar to Frostbite's:
https://www.slideshare.net/gwihlidal/optimizing-the-graphics-pipeline-with-compute-gdc-2016. Frostbite's algorithm is designed for GCN2, but their algorithm still shows significant gains on GCN3 (Fury X), especially when culling is done using async compute. The culling cost could be easily reduced by removing some culling steps that modern GPUs handle efficiently.