It's not pretty, but the gather could be replayed in its entirety upon any fault whatsoever, and if you so desire, after an interrupt. All that's required is that the cache and TLB have at least as many ways as there are elements in the vector, because otherwise you could get an infinite loop where the later fields keep evicting the former ones.
Replaying the gather (re-issuing it from the scheduler) won't do. The ROB is of limited size, 168 entries in Haswell, equivalent to just 42 cycles at full tilt. You'd need to remove the gather and all subsequent instructions from the ROB, which means a faulting gather acts like a mispredicted branc on top of all the other penalties.
The load, mask, loop until mask=0 guarantees forward progress, doesn't require any internal state saved on preemption, doesn't throw away work done by subsequent instructions already executed thanks to the OOO machinery. The OOO machinery can also overlap multiple independent gather-loops.
And as Aaron points out. There is nothing preventing Intel recognizing the load/mask/branch idiom in future implementations, speeding it up.
Cheers