Jawed
Legend
So GA102's FP32/INT32s take rather more than a trivial amount of die space compared to the INT32 version... Well, this was always my suspicion.Probably just die size reasons. NVIDIA couldn't make the die bigger even if they really wanted to, so they'd have to cut out other parts to do it. As for why not reduce SM count to fit double ALU per SM, balance of resources is the logical answer here.
If it's for training and nothing else? But it has loads of FP64, which isn't for training. So FP32 doesn't matter? I would tend to agree, NVidia decided that the new tensor core was more important than anything else, but they couldn't sacrifice FP64.FP32 doesnt matter for GA100. For training they will use TF32 per default.
So could we take this to mean that the tensor core design is how NVidia now names its architectures?Another possible scenario is that GA100 was made considerably earlier than GA10x and the updated FP32/INT h/w wasn't ready for it. We've seen something similar between Volta and Turing previously.
If we say that Quadro/Titan/Geforce are for "prototyping" (for apps that end up on DGX) then it seems reasonable to conclude that harmonising the tensor core architecture is the most important aspect of a family of GPUs.