Jawed
Legend
If two distinct warps are always required to enable co-issue to both SIMD16s (FP32 and FP32/Int) then I guess that's the utilisation problem right there.I am not 100% sure but if the 2 instructions per clock come from different warps what you’re asking for probably doesn’t exist.
I can imagine transcendentals going through the SFU at what appears to be 4 per clock (per partition), adds to dependency-chain-length problems, reducing the count of available warps for dual-issue.