A hybrid ALU that can do both 32 and 64 bit operations saves transistors but costs power due to keeping more of the transistors active during compute. There are also likely pipeline bubble losses if you're mixing 32 and 64 and the ALU has to change modes.
A separate unit costs transistors, but can be idled when unneeded, saving a lot of power. This is very common for 64 bit units which may never be fired up in the whole lifetime of a gamer's GPU use.
GPUs are now more power constrained than transistor limited, so spending the die area on separate 64 bit ALUs is worthwhile.
Ok, but it is exactly what is liking the GCN architectures, they need to be filled ( for dont say over filled ), they dont like to use power for nothing.. Basically the scalar chips and other units chips are taken over the instructions who can be made by the "hybrid " FP32/64 cores.. ( its just a question of additional instructions support )
its 2 differents approach of the problem, i admith both have their lost and win ... but lets be honest this remind me the situation of 64bit AMD vs Intel Itanium 64 bit instructions.. i see a lot more of win on the side of AMD, they was first to bring 1/2 DP rate on the table afterall. where Nvidia was really constrained.. there 1:3 DP rate was really theoric, and when software was not aligned, we was more in in the 1/4 even less rate in average ) GCN is using scalar units for do the complementary job and dispatch branch for allow high occupancy and this is working really well. With all their wavefront configurations, you get a true parrallel system capable .
I will be honest, when i say i dont understand why "Nvidia dont take the same road", i lie somewhere, Pascal, Volta will use the same scalar approach than GCN. The system used by Nvidia cost them too much transistors for be viable.. and if it work well for AMD, i dont say why they will not use it.
I AM 100% sure they will go the same road, .. But i should have start to say i understand why they have not do it for Maxwell allready.
@Dilletante, you forget the scalar units.. This said you are right, with an GCN chips you want it over filled, occupancy should be at his maximum, then you see all his power, ( but is it not what we want when compute things ? ) ...