Thanks for the info RysWe don't execute different thread types simultaneously, but we can switch to a new thread type in the next clock, and threads can issue to either the F32 or F16 datapath, just not both at the same time.
To confirm that I understood it correctly: If there are stall cycles (cache miss) in my pixel shader that is running FP16 instructions, the same USC can efficiently fill the stall cycles with for example vertex shader FP32 instructions?