A
second blog post from NVidia, more about machine learning with Titan X. New fun detail is that multiple Titan X's were handed out to random audience members in a giveaway, so actual GPUs are out in the wild already.
The first blog post talks about Pascal's Titan X's machine learning throughput (as a mysterious 44 TOPS), which is
not in reference to the P100's touted fp16x2 capability. Instead the single line mentions something about 4x rate 8 bit "inference instruction".
This is something new. Unless it's talking about GP104's DP4A and DP2A 8 and 16 bit instructions, which were unadvertised in the GTX 1080 launch but are accessable via CUDA and are documented in the CUDA 8.0RC PTX reference. DP4A is "Four-way byte dot product-accumulate" and DP2A is "Two-way dot product-accumulate", which certainly could be the "inference instructions". Notably P100 tesla, with sm_60, lacks these instructions. GP104 is sm_61 and has them. (In my earlier post I mistakenly talked about sm_5x.. Pascal is sm_6x).