Haswell CPU is an out-of-order core and has 192 entry ROB (ROB is likely even larger in Skylake). Compared to GPU, Haswell can additionally hide memory latency by executing (up to 192) other (independent) instructions from either of the 2 instruction streams if an instruction stalls.
In any given cycle, the scheduler in Haswell can pick from 60 entries to send down the issue pipes, although this is not too much of a problem if we are going with the very, very optimistic case of 192 independent ROB entries ready to go on the rename and issue side of the pipeline. That case could live with as many scheduler entries as there are issue ports, as unlikely as it is.
Determining latency hiding capability would also depend on what is meant by that term, such as whether we are looking primarily at whether the vector units are idling or not. This is typically what people look at when analyzing the latency hiding capability of a GPU, although that's not the same as a thread stalling in either architecture.
A stall-free condition and optimum instruction mix for Haswell's 192 ROB entries, a single main memory access excepted, would take 24 cycles to run through.
A purely VALU-focused measurement would take the 168 AVX registers, subtract 32 for the architectural state for the two threads, divide that by two for the two FMA ports to yield a respectable 68 cycles of latency hiding.
That's 8 or ~22 nanoseconds at 3 GHz, which is good enough for an on-die memory access.
That isn't a good measure of MLP, which could be used to overlap miss penalties. Load and store buffers (72 and 48) and the 10 or 16 outstanding line misses (L1 and L2 respectively) would track that better.
It's difficult to find corresponding access numbers for GPUs. The ISA-permitted theoretical peaks for GCN, for example, are extremely high.