Benetanegia
Regular
How can it? It's 64 bits per Operand instead of 32, but with only 960 instead of 2880 ALUs, so 2/3rds.
Maybe it's me, but that conflicts directly with your other statement. It looks like in practice it's 960 vs 1920 ALUs, so the same instead of 2/3rds. And in fact, it looks like registers are the limiting factor in both cases.
And my reasoning is that with a double, you can hold half as many operands. Even if you only have half the ALUs to feed, I'm pretty sure that trips to memory are going to be more frequent and you'd again only fetch 1/2 the amount of operands, decreasing the probability that what was copied along would be required in the immediate future, thus requiring another access sooner than you would with single precision. I'm not talking about massive increases, but I'm pretty sure the bandwidth requirements are higher.