latching onto individual figures is not a useful way to understand performance differences.
OK lets say that everything else in the hardware is identical for a second, so you have a 50% performance difference, but only in ALU limited situations.
When I render shadows, I'm ROP (or rathe Z-Fill) limited, if my shaders are texture heavy, the ALU's sit idle and wait on memory, on current PC GPU's if the vertex workload is dominant, for the most part they grossly underutilize the ALU's.
So what percentage of the time do the extra ALU's actually help?
If it's 40% it's a 20% performance difference, if it's 80% it's a 40% difference I would guess it will end up being closer to the first than the second.
If 720 has more ROPs (and enough associated bandwidth) or a larger register pool and can hide more latency, then it gets more of that difference back, because it runs other portions of the frame faster.
Now I'm not suggesting it's "faster" or for that matter "slower", I'm saying it's just one aspect of a design.