That's a really interesting and perceptive question. The energy needed to compute a FP64 FMA is about four times the energy needed for FP32. So 1/2 rate FP64 ALU's could in theory use twice the wattage of FP32! But is that increase minor compared to the significant overhead of data transfer and static RAM register memory access? My immediate guess is "the power difference is ignorable" but there's evidence that it's not. Modern high core count Xeons have to
power gate and downclock 20% or more when running AVX code, showing the ALUs are using a large fraction of the Xeon's power budget. A GPU is even more ALU dense so it should be even more sensitive.
An easy way to test this is to take a Kepler Titan, Quadro, or Tesla (with unlocked 1/3 FP64 rate) , and run say both SGEMM and DGEMM and look at the wall socket power use. Any power difference will be even more distinct in P100 with its 1/2 FP64 rate.