I was confused at the mentioning of data paths at first, too. But I think he just meant the blocks the Warps are assigned to.
FP32+[FP32|INT32] is still my go-to choice with the following reasoning from available material:
1st:
https://www.nvidia.com/content/dam/...ure/NVIDIA-Turing-Architecture-Whitepaper.pdf
Page 13, concurrent execution: 36x INT32 for every 100 FP32 over a variety of gaming workloads
Going FP32+[FP32|INT32] would obviously reduce performance compared to FP32+FP32+INT32.
Evidence 2:
Chart at timestamp 19:05 shows 3080 vs. 2080 perf in 4k which for the games, i.e. workloads using aforementioned mix, is at roughly 1.6x to 1.7x
200 (2x FP32) - 36 (INT32 share) ist 164 and almost exactly where the perf seems to be at.