It’s 1 triangle per primitive engine. There are 1 primitive engines per shader array or 2 per shader engine. So total 4 triangles per clock and 64 pixels (ROPS).
So you're doing 4 triangles per clock, actually. But that is optimal triangle/pixel emission. The number can actually go down. And it will go down as you go below the thresholds like 1 triangle and exponentially worse once you get into sub pixel sizes.
This blog illustrates this well
So using simple triangle fillrate test, the tiles are primitives, so they should divide into 2. Starting at 1 tile of 1080p, and shrinking to 1x1 pixels. You can see at tile (1,1) Performance has completely fallen off a cliff.
Subpixel performance becomes an exponential graph the smaller the triangle is. WRT to what we saw, UE5 is at
1
So we used optimal numbers which was wrong, it's simply not possible to do small triangle per pixel performance using fixed function pipeline, at least not in the traditional sense. They must have done something entirely different, and I suspect UE5 is a very heavy compute based pipeline.