Sorry my way of writing and getting lazy in this heat that caused confusion.
When Nvidia talk about mixed-precision for Pascal GP100 it is a single Cuda core that is FP32/FP16 (so FP16x2), and so gives the doubling of Tflops,the only other Cuda core is the FP64.
In theory this could also be done for FP64 Cuda cores if they overcome other limitations I think comes back to register-bandwidth, and maybe this is a capability of Volta.
The mixed-precision FP32 Cuda core development-evolution can be seen going back to Tegra X1 and for functions such as image recognition/Deep learning, scroll down to Double Speed FP16 :
http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview/2.
So it is debatable whether this is the same mixed-precision FP32 Cuda core from GP100 but fixed as FP16x2 operation, or another unique kind of Cuda core.
As it is there for compatibility reasons it seems, I would assume same as GP100 but this is Nvidia.
As Ryan mentions there is one of these cores per SM, which gives FP16 the FLOP rate ratio 1/64 and matches up with tests done by others, absolutely useless apart from compatibility testing and tbh not sure how many Cuda developers will consider this card even for that.
Cheers