Jawed
Legend
In Fermi there's varying instruction-throughput and latency, i.e. SP-MAD versus DP-MAD versus RCP versus more complex instructions that run on SFU (which actually appears to be the multifunction interpolator of old, since it also does interpolations).
That complexity could be tackled either by the compiler, by the scheduler/issuer or some combination of the two.
I suppose it's fair to say that soft-vectorisation enjoys the same options - so overall we're none the wiser.
That complexity could be tackled either by the compiler, by the scheduler/issuer or some combination of the two.
I suppose it's fair to say that soft-vectorisation enjoys the same options - so overall we're none the wiser.