Programmable hardware is never as efficient (perf/watt and/or silicon area) as fixed function hardware. But programmable hardware makes it easier to get full utilization of those transistors as it is not limited to executing single kind of work.
I wouldn't call ACEs fixed function hardware. GPU front ends are fully programmable processors with programmable memory access. You just don't have access to program it yourself, the driver team does. Traditionally fixed function hardware tends to have fixed data inputs and data outputs. For example texture sampler, ROP (blend, depth test, HiZ), DXT block decompressor, delta color compressor, triangle backface culling, etc. These are highly performance critical parts of the chip, making it a big perf/watt win to use fixed function hardware to implement them. Also hard-wiring reduces latency compared to a programmable pipeline.
On the other hand, a GPU front end processor only needs to launch a couple of draws/dispatches in a microsecond. That's huge amount of cycles. It is not worth optimizing its throughput at cycle precision, thus programmable hardware makes sense.