No, because a CPU is limited by its integer add speed not its multiply speed.
So I was saying a dot product takes 5x longer (4 cycles longer) than an add, or 20% clockability if in one stage.
Yeah, my mistake. 8 inputs, 4 mults, 3 adds.
And about cross products, the point was...