Nvidia made a mistake with the 900 series in which was very popular with data science folks. Everyone bought 980TIs because they were so effective.
In Pascal consumer graphics cards they nuked FP16 to run at 1/128 the FP32 speeds -- yes take 6 TF / 128 = FP16 performance.
Basically, it's useless for ML unless you do everything in FP32, which almost no one does it's mostly FP16 and below for deep learning.
My CPU will run circles around my 1070 if the NN is set to FP16 for deep learning right now.
If you want fast FP16 during pascal you had to pony up for a professional card, then the pascal cuda cores run FP16x2
Turing and Volta resolved those issues, but costs are higher.
Ampere is actually the next logical good overall performing card at a reasonable price point.
Ah OK. I was just wondering, so the trade off for speed over precision is worth it? It's been a while since I dabbled with ML (I did a degree in AI but we didn't have ready made libraries or hardware acceleration then!! It was weeks of running epochs for GAs etc). I'm guessing mixed precision doesn't help in this use case.