That's said it much better yes.CarstenS: tensor cores lower register file bandwidth per math operation because they work on tensors rather than scalars (N^3/N^2).
Nvidia actually shows this phenomenon in their animated tensor core cartoons in their keynotes.
So it’s likely they are close to peak RF bandwidth both at 78 scalar TFlops as well as at 312 tensor TFlops.
The implication from my post might be that there's a ton of RF bandwidth available if not using tensor cores which is not the case. Tensor cores could also raise number of active registers of a kernel thus hurting occupancy a bit.