pixelio
Newcomer
Ok, let me put it this way: Even if no special FP16 goodness was in the GP104 hardware, it should be able to execute FP16 at the same rate as FP32, like Maxwell and Kepler before it, right?
Kepler and Maxwell can't perform any fp16 computations unless there are some triple-secret non-CUDA instructions that I don't know about.
I'm guessing the source of confusion here is that you can get "free" conversion from fp16 to fp32 if you pull your data through the texture hardware.
Something like the following would load 4 f16 elements via a texture and auto-magically convert them to 32-bit floats:
Code:
tex.1d.v4.f32.f16 {r0,r1,r2,r3}, [tex_a,idx];
Last edited: