I believe FP16 ops are routed automatically to Tensor cores on NVIDIA hardware, without the developer intervention.
I don't know but what he is saying they don't use tensor core at all. But FP16 will only be available on PC with the 5.3.
EDIT: It seems to say than Direct ML is a problem for tensor core usage. From what I have seen he said they don't use tensor code and they said the code run fast on AMD 5700x but he doesn't talk about Nvidia hardware.
EDIT: He said they don't use tensor core.
What makes TSR in 5.1+ so different from 5.0 is in how many more convolutions it does using this exposed hardware capabilities. That RejectShading is doing like 15 convolutions in 5.1+, each 3x3 on a current hardware compared to 3 in 5.0, which allows to make TSR substentially smarter thanks to very neat discovered properties chaining some very particular convolutions do. And while this number of convolutions massively increased by a factor of 5, the runtime cost of this part of TSR didn’t change, and yet this gain in smartness of the algorithm allowed to cut significant amount of other costs that was no longer required in the rest of the TSR algorithm which is core reason behind this performance saving from 3.1ms to 1.5ms on these console. Sadly this expose hardware capabilities in standard HLSL are not benefiting all GPUs equally because how they decided to architecture their hardware too.
We can’t do miracle using specifically marketed hardware feature in most efficient manner with what is only a subset of these exposed capability to us at the moment. But we can still do some surprising stuff on existing hardware wrongly assumed by the publicly uncapable of doing some particular things. And we are able to do it just with standard features, and better understanding of how the GPU works thanks for instance to that AMD pdf I linked above.
EDIT: On Nvidia and Intel GPU developer will probably use DLSS and XeSS respectively.
Last edited: