Obviously die cost would go up in this case.They could have included 4 CUs instead and clocked them to half, and had exactly the same performance, but at a lower TDP.
Power may not necessarily be lower because although your dynamic power may be equal (or even better due to a lower voltage), the static (leakage) power may will increase and it may end up as a tradeoff in peak power scenarios. Idle power will surely increase, so this coupled with increased die costs will dictate smaller being better.