Sandy Bridge already power gates the AVX units. Agner Fog discovered that the latency of the instructions depends on whether any AVX instructions have been executed recently.If you wanted AVX2 to be a viable option, personally I wouldn't widen it to 1024-bit-over-4-cycles, but rather include a ridiculously beefy 256-bit AVX2 with FMA pipeline (at least as fast as Haswell) as an option to the 22nm Silvermont Atom core. And I wouldn't just clock gate it; I'd power gate it like ARM optionally does for their NEON SIMD (obviously this adds some scheduling complications if you care about maximizing performance but I expect it to be manageable).
Clock gating the front-end is orthogonal to that. And the other benefit of executing 1024-bit instructions over 4 cycles is the latency hiding. That's something that would definitely benefit Atom too.