Isn't the number of threads determined by pipeline/memory latency hiding requirements?
As an example, imagine the kernel requires 256 scalar registers. On VLIW that's 64 vec4 registers. So, on VLIW you get 4 hardware threads to hide latency. On GCN that's also 4 hardware threads. The hardware thread count is the same because both architectures have the same quantity of register file per CU.
With VLIW these 4 hardware threads are sharing a single SIMD and a single register file. So when 1 thread issues a memory request, there are 3 other hardware threads that can run.
On GCN, you have 4 SIMDs, each with a private register file that's one-quarter of the CU's total register file capacity. So when sharing out the 4 hardware threads, each SIMD ends up with a single hardware thread. So when any of those hardware threads issues a memory request, the SIMD it was running on falls idle. Therefore latency has not been hidden.
To hide even the smallest amount of latency requires at least two hardware threads. On GCN with 4 SIMDs per CU, that means 8 hardware threads are required. On VLIW, only 2 hardware threads are required. When both are given a kernel with the same register file allocation (measured in bytes per work item), you will end up with more latency-hiding capability on VLIW.
hardware thread count = Compute Unit RF size / work item allocation / work items per hardware thread / count of SIMDs
CU RF Size = 262144 bytes
work item allocation = 256 scalar registers * 4 bytes per register = 1024 bytes
work items per hardware thread = 64
You can rewrite this in terms of per-SIMD:
hardware thread count = RF size / work item allocation / work items per hardware thread
RF Size is 65536 bytes on GCN. But on VLIW it's 262144.
The counter argument would be that on GCN, when a SIMD falls idle, only 16 SIMD lanes are actually idling. Whereas on VLIW, 64 lanes are idling (and each of those is 4 or 5 operations). The problem with this argument is simply that the VLIW architecture generally has enough hardware threads to not fall idle, or to idle for much shorter periods of time.
EDIT: brainfart alert, that should be "Whereas on VLIW, 80 or 64 lanes are idling (VLIW-5 or VLIW-4)."