It's not as if those tensor cores couldn't be used for inference either. A GV100 using FP16 tensor cores will still be much faster than a GP102 using INT8 math.
I think we are in agreement just you are thinking now while my point is about product line and what Nvidia will do when there is a notable overlap between their top two Tesla GPU cards for DL ecosystem.
My post is in context of how Nvidia differentiate between Gx100 and GX102 and the headache it is causing them down the road an even a bit now, especially as some want a powerful single node doing both Training and Inference.
What you just said agrees with my post in some ways, you do not need both with their DL ecosystem (I agree others will want independent nodes though for training/inference); and the GV102 without DP cores will be a full uncut GPU meaning more SMs (with 8 Tensor per SM) and slighlty higher clock speed, meaning it will have greater performance than GV100 if one ignores DP.
Nvidia need to decide a better way to differentiate the Gx100 and Gx102 rather than by FP16 and Int8 down the road and probably by next generation.
It does not make sense to limit FP16 to just the GPU that has less SM and lower clocks (due to also supporting DP cores) and lower allowance for yield , especially as the Gx102 is importantly a smaller die with greater performance in this DL and FP32 context.
Market demands and competitors will probably force them to change IMO.
Maybe they can keep NVLink2 and its benefits to the Gx100 as the differentiator, but then again at some point they may have to consider this also on a Gx102 variant.
Would be nice though if they give the general CUDA cores full Vec2 FP16 throughput on the GV102 even if they decide to limit use of Tensor cores in some way.
Cheers